DAVI Visual Analytics

Posted Nov 28, 2024 Updated Dec 2, 2024

By Wei Xiong

9 min read

Lecture Notes on Visual Analytics and Progressive Visual Analytics

Opening Remarks

The professor notes a record low attendance and humorously remarks:
- “I think visual analytics will be one of my favorite subjects at the exam.”
Today’s lecture will focus on
Visual Analytics
for three hours.
- Visual Analytics is a topic that could fill a whole semester.
- It is closely related to current state-of-the-art research, including the professor’s own work.
The professor mentions that previous lectures covered foundational topics, but today will push into the forefront of what is known about visualization.

Visual Analytics

Introduction to Visual Analytics

Visual Analytics combines automated analysis with interactive visualizations to support understanding, reasoning, and decision-making from large and complex datasets.
It leverages the strengths of both computers and humans:
- Computers excel at data storage, numerical calculations, and data mining.
- Humans excel at cognition, creativity, and background knowledge.
Visualization acts as the communication channel between computers and humans(KEIM’S ABILITY MATRIX).

Definition

Visual Analytics combines the computational power of computers with human intuition to find effective solutions.
It aims to detect the expected and discover the unexpected in data.

Key Principles of Visual Analytics

Appropriateness Principle:
- Provide neither more nor less information than needed to solve the problem.
- Related to effectiveness in visualization.
Naturalness Principle:
- Representations should closely match the information being represented.
- Similar to expressiveness in visualization design.
Matching Principle:
- Visual analytics must match the task to be performed.
- Tailor visualizations to support specific tasks (e.g., comparison vs. localization).
Apprehension Principle:
- The content should be accurately and easily perceived.
- Avoid overly complex or embellished visuals that hinder understanding.

Examples of Visual Analytics

Example 1: Interactive Clustering

Users interactively refine clusters in a dataset (e.g., U.S. states).
By moving data points between clusters, the system updates clustering results in real-time.
The system learns from user interactions to adjust weights and improve clustering.

Key Points:

Combines computational clustering (K-means) with user input.
Allows for active learning and refinement based on domain knowledge.
Enhances understanding of how clusters are formed.

Example 2: PivotSlice for Faceted Data Exploration

An interactive system for exploring faceted datasets (e.g., academic publications).
Users can:
- Search and filter data based on keywords.
- Visualize relationships between different facets (e.g., authors, conferences, keywords).
- Analyze citation patterns and publication trends.

Key Points:

Provides efficient faceted browsing capabilities.
Enables flexible analytical functions to reveal explicit data relationships.
Supports tasks like identifying influential authors or trends over time.

Progressive Visual Analytics

Motivation

Traditional visual analytics can suffer from long computational times, disrupting the user’s analytic flow.
Human Time Constraints:
- Task Completion: ~10 seconds (acceptable for command-line interactions).
- Immediate Response: ~1 second (acceptable for indirect interactions).
- Perceptual Update: ~100 milliseconds (required for direct interactions).

Definition

Progressive Visual Analytics (PVA):
- Progressive Visual Analytics (PVA) is a type of Visual Analytics where the computational analysis produces intermediate results as approximations of the final result, so as to fulfill the human time constraints imposed by the visual-interactive analysis and thus maintain the user’s analytic flow.
- PVA can be (either) an incremental process that subdivides the data, processes it chunk-by-chunk, and thus yields intermediate results of increasing quantity / completeness – or it can be an iterative process that subdivides the computation, runs it step-by-step, and thus yields intermediate results of increasing quality / correctness.
It fulfills human time constraints by:
- Providing early, meaningful partial results.
- Continuously refining results as more data or computations are processed.

Types of Progressive Processing

Incremental (Data Chunking):
- Processes data in chunks.
- Each iteration adds more data, increasing completeness.
Iterative (Process Chunking):
- Processes the full dataset but refines computations over iterations.
- Each iteration improves the quality of the result.

Progressive Visualization Timeline

T_response: First meaningful partial result (e.g., initial layout in force-directed graphs).
T_reliable: Trustworthy partial results that allow for basic observations.
T_stable: Last significant partial result before the final result.
T_complete: Final result is achieved.

Various uses of the progression

Overcoming bottlenecks
- algorithmic complexity
- big data
- slow network connection
- combinatorial explosion of search space
Resource-conscious VA:
- find good-enough solutions (reduce time to decision, power consumption)
- adjust to available screen space (reduce network load, cognitive load)

Roles in Progressive Visual Analytics

1. Observer

Interest: Final result and understanding how it was achieved.
Benefits from PVA:
- stay informed about the state of the computation (aliveness)
- estimate how long it will take until it is done (progress)
- watch the algorithm working and see the final result unfold (provenance)
Expectations:
- Steadily improving outputs leading to a predictable end.
Analogy: Watching a loading process with a visual progress indicator.(loading process, fancy progress bar)
Requirements: Make the progression as stable as possible! (e.g. Pause/Play)
Example:

2. Searcher

Interest: Trustworthy intermediate results for solving specific problems within a large information space (data, param, model)
Benefits from PVA:
- fully interactive intermediate results allowing their visual exploration.
- cancel the running process, saving the system a costly exhaustive search(early termination)
Expectations:
- Stepwise process with useful early outputs.(PVA as step-wise process that produces constantly updated results which can be used in place of a final result for subsequent analytic operations)
Analogy: Using a search engine and not needing all results to make a decision.(internet search engine, product search)
Requirements: Make the progression as useful as possible as early as possible!
Example: Incremental querying of flight delay data to decide the best day to fly.

3. Explorer

Interest: Understanding data and processes through interaction with the progression.
Benefits from PVA:
- Adjusts parameters or data during runtime. (orient themselves and decide in which direction(s) to steer the “tour” through the data or parameter space, swiftly adjusting process settings at runtime)
- Explores multiple scenarios or branches.(branching-off alternatives & comparing their trajectories(multiverse analysis))
Expectations:
- Malleable process that is highly interactive.
Analogy: Physics simulations or “what-if” analyses.
Requirements: Make the running progression as flexible/interactive as possible!
Example: Flood simulation where users can adjust sandbag placements and water levels to see different outcomes.

Implementing Progressive Visual Analytics

Interaction Techniques

Pause/Unpause:
Early Termination:
- Users can stop the progression when satisfied with the current result.
- Leaves behind “progressive guards” to alert if assumptions become invalid.
Steering:
- Users adjust parameters or focus areas during progression.
- The system reprioritizes computation based on user interaction.

Visualization Strategies

One Data Item per Mark:
- Scatter plots, node-link diagrams, and parallel coordinates.
- New data items are added as they are processed.
Aggregated Visualizations:
- Histograms, bar charts, and binned scatter plots.
- Visualizations update aggregate properties (e.g., bar heights) without adding new marks.
Combination Approaches:
- Visual Sedimentation:
  - Combines individual data items with aggregate views.
  - Data items accumulate visually, resembling sediment layers.

Software Architecture

Multi-threading Architecture:
- Separates event handling, visualization, and computation into different threads.
- Allows for responsiveness and interaction during computation.
Layered Processing:
- Semantic Layers:
  - Divide visualization elements (axes, data points, labels) into layers.
  - Process and render layers progressively.
- Incremental Layers:
  - Process data in increasing detail or quantity.
  - Early layers provide a rough overview; later layers refine the visualization.
- Level-of-Detail Layers:
  - Adjust the level of detail based on zoom level or screen space.
  - Coarser details when zoomed out; finer details when zoomed in.
original text

Transient Visual Analytics

Concept

Addresses limitations when data volume exceeds memory or screen space.
Unlike PVA, which accumulates data towards completion, transient visual analytics continuously updates the view by:
- Adding relevant data.
- Removing less relevant data.
The view remains intermediate but adapts to the user’s focus and interactions.

PVA VS. TVA

Advantages

Always presents data pertinent to the current analytic task.
Supports exploratory analysis where user interests evolve.
Prevents overcrowding of the visualization.

Example

Dynamic Scatter Plot:
- As the user zooms or pans, data outside the view is discarded.
- New data relevant to the current view is loaded and displayed.
- Ensures the visualization remains clear and responsive.

Conclusion

Visual Analytics and Progressive Visual Analytics are crucial for handling large and complex datasets interactively.
Understanding the different roles (Observer, Searcher, Explorer) helps tailor systems to user needs.
Implementing PVA requires careful consideration of interaction design, visualization techniques, and software architecture.
Transient Visual Analytics represents a step forward, providing adaptive views that align with the user’s analytic workflow.

Note: These lecture notes are based on the professor’s presentation and include key examples and principles discussed during the lecture.

Quick Recap Quiz

The professor conducts a quiz using Menti (code provided during the lecture).

Question 1

When pre-processing text, turning the word “worst” into “bad” is done through:

Options:
- Stemming
- Normalization
- Lemmatization
- Noise removal

Answer: Lemmatization

Explanation:

Lemmatization
reduces words to their base or dictionary form (lemma).
- For example, “worst” becomes “bad”.
Stemming
simply chops off word endings to reduce them to a stem.
- It cannot turn “worst” into “bad”.
Normalization
handles non-standard spellings or elongated words.
- Example: “You’re so goooood” becomes “You’re so good”.
Noise Removal eliminates punctuation, smileys, and non-character tokens.

Question 2

The word cloud is a direct visual representation of:

Options:
- The bag of words
- The n-gram vector
- The inverse document frequency
- The HyperX Logominer

Answer: The bag of words

Explanation:

A
word cloud
visually represents the frequency of words in a
bag of words
model.
- Words are displayed in varying sizes based on frequency.
N-gram vectors consider the context of words (sequences of n words).
Inverse Document Frequency (IDF) is used in tf-idf to weigh terms that are common in a document but rare in others.
Hapax Legomena (not “HyperX Logominer”) refers to words that occur only once, used in literature fingerprinting.

Question 3

Which of these is *not* an approach to simplify line charts into sparklines?

Options:
- Extracting perceptually important points (PIPs)
- Sampling
- Piecewise Aggregate Approximation (PAA)
- Clustering

Answer: Clustering

Explanation:

Sampling: Selects equidistant points from the line chart.
Piecewise Aggregate Approximation (PAA): Divides the chart into segments and takes the average in each.
Perceptually Important Points (PIPs): Selects points based on visual significance by recursively adding points with the most significant deviation.
Clustering is not used for simplifying line charts into sparklines.

Question 4

Which of the following scenarios can be addressed by word-sized visualizations?

Options:
- Providing context for a textual statement
- Providing a color scale for a chart
- Providing textual labels on a map
- Providing a word cloud for quantitative data

Answer: Providing context for a textual statement

Explanation:

Word-sized visualizations
(e.g., sparklines) are tiny charts embedded within text to provide additional context.
- Example: Showing a stock price trend alongside the current price.
They are not used for color scales, map labels, or word clouds for quantitative data.

Question 5

What is literature fingerprinting?

Options:
- A pre-processing technique for text
- Pixel-based text visualization
- An improved word cloud algorithm
- A randomized greedy layout

Answer: Pixel-based text visualization

Explanation:

Literature fingerprinting
uses pixel-based visualization to represent large texts.
- Each section or word is represented as a pixel, allowing visualization of patterns in large documents.
Randomized Greedy Layout is associated with word cloud placement algorithms.

Data Visualization, Lecture

This post is licensed under CC BY 4.0 by the author.

Lecture Notes on Visual Analytics and Progressive Visual Analytics

Opening Remarks

Visual Analytics

Introduction to Visual Analytics

Definition

Key Principles of Visual Analytics

Examples of Visual Analytics

Example 1: Interactive Clustering

Example 2: PivotSlice for Faceted Data Exploration

Progressive Visual Analytics

Motivation

Definition

Types of Progressive Processing

Progressive Visualization Timeline

Various uses of the progression

Roles in Progressive Visual Analytics

1. Observer

2. Searcher

3. Explorer

Implementing Progressive Visual Analytics

Interaction Techniques

Visualization Strategies

Software Architecture

Transient Visual Analytics

Concept

PVA VS. TVA

Advantages

Example

Conclusion

Quick Recap Quiz

Question 1

Question 2

Question 3

Question 4

Question 5

Trending Tags