DAVI Visual Analytics
Lecture Notes on Visual Analytics and Progressive Visual Analytics
Opening Remarks
The professor notes a record low attendance and humorously remarks:
- “I think visual analytics will be one of my favorite subjects at the exam.”
Today’s lecture will focus on
Visual Analytics
for three hours.
- Visual Analytics is a topic that could fill a whole semester.
- It is closely related to current state-of-the-art research, including the professor’s own work.
The professor mentions that previous lectures covered foundational topics, but today will push into the forefront of what is known about visualization.
Visual Analytics
Introduction to Visual Analytics
- Visual Analytics combines automated analysis with interactive visualizations to support understanding, reasoning, and decision-making from large and complex datasets.
- It leverages the strengths of both computers and humans:
- Computers excel at data storage, numerical calculations, and data mining.
- Humans excel at cognition, creativity, and background knowledge.
- Visualization acts as the communication channel between computers and humans(KEIM’S ABILITY MATRIX).
Definition
- Visual Analytics combines the computational power of computers with human intuition to find effective solutions.
- It aims to detect the expected and discover the unexpected in data.
Key Principles of Visual Analytics
- Appropriateness Principle:
- Provide neither more nor less information than needed to solve the problem.
- Related to effectiveness in visualization.
- Naturalness Principle:
- Representations should closely match the information being represented.
- Similar to expressiveness in visualization design.
- Matching Principle:
- Visual analytics must match the task to be performed.
- Tailor visualizations to support specific tasks (e.g., comparison vs. localization).
- Apprehension Principle:
- The content should be accurately and easily perceived.
- Avoid overly complex or embellished visuals that hinder understanding.
Examples of Visual Analytics
Example 1: Interactive Clustering
- Users interactively refine clusters in a dataset (e.g., U.S. states).
- By moving data points between clusters, the system updates clustering results in real-time.
- The system learns from user interactions to adjust weights and improve clustering.
Key Points:
- Combines computational clustering (K-means) with user input.
- Allows for active learning and refinement based on domain knowledge.
- Enhances understanding of how clusters are formed.
Example 2: PivotSlice for Faceted Data Exploration
- An interactive system for exploring faceted datasets (e.g., academic publications).
- Users can:
- Search and filter data based on keywords.
- Visualize relationships between different facets (e.g., authors, conferences, keywords).
- Analyze citation patterns and publication trends.
Key Points:
- Provides efficient faceted browsing capabilities.
- Enables flexible analytical functions to reveal explicit data relationships.
- Supports tasks like identifying influential authors or trends over time.
Progressive Visual Analytics
Motivation
Traditional visual analytics can suffer from long computational times, disrupting the user’s analytic flow.
Human Time Constraints:
- Task Completion: ~10 seconds (acceptable for command-line interactions).
- Immediate Response: ~1 second (acceptable for indirect interactions).
- Perceptual Update: ~100 milliseconds (required for direct interactions).
Definition
- Progressive Visual Analytics (PVA):
- Progressive Visual Analytics (PVA) is a type of Visual Analytics where the computational analysis produces intermediate results as approximations of the final result, so as to fulfill the human time constraints imposed by the visual-interactive analysis and thus maintain the user’s analytic flow.
- PVA can be (either) an incremental process that subdivides the data, processes it chunk-by-chunk, and thus yields intermediate results of increasing quantity / completeness – or it can be an iterative process that subdivides the computation, runs it step-by-step, and thus yields intermediate results of increasing quality / correctness.
- It fulfills human time constraints by:
- Providing early, meaningful partial results.
- Continuously refining results as more data or computations are processed.
Types of Progressive Processing
- Incremental (Data Chunking):
- Processes data in chunks.
- Each iteration adds more data, increasing completeness.
- Iterative (Process Chunking):
- Processes the full dataset but refines computations over iterations.
- Each iteration improves the quality of the result.
Progressive Visualization Timeline
- T_response: First meaningful partial result (e.g., initial layout in force-directed graphs).
- T_reliable: Trustworthy partial results that allow for basic observations.
- T_stable: Last significant partial result before the final result.
- T_complete: Final result is achieved.
Various uses of the progression
- Overcoming bottlenecks
- algorithmic complexity
- big data
- slow network connection
- combinatorial explosion of search space
- Resource-conscious VA:
- find good-enough solutions (reduce time to decision, power consumption)
- adjust to available screen space (reduce network load, cognitive load)
Roles in Progressive Visual Analytics
1. Observer
Interest: Final result and understanding how it was achieved.
Benefits from PVA:
- stay informed about the state of the computation (aliveness)
- estimate how long it will take until it is done (progress)
- watch the algorithm working and see the final result unfold (provenance)
Expectations:
- Steadily improving outputs leading to a predictable end.
Analogy: Watching a loading process with a visual progress indicator.(loading process, fancy progress bar)
- Requirements: Make the progression as stable as possible! (e.g. Pause/Play)
- Example:
2. Searcher
Interest: Trustworthy intermediate results for solving specific problems within a large information space (data, param, model)
Benefits from PVA:
- fully interactive intermediate results allowing their visual exploration.
- cancel the running process, saving the system a costly exhaustive search(early termination)
Expectations:
- Stepwise process with useful early outputs.(PVA as step-wise process that produces constantly updated results which can be used in place of a final result for subsequent analytic operations)
Analogy: Using a search engine and not needing all results to make a decision.(internet search engine, product search)
Requirements: Make the progression as useful as possible as early as possible!
Example: Incremental querying of flight delay data to decide the best day to fly.
3. Explorer
Interest: Understanding data and processes through interaction with the progression.
Benefits from PVA:
- Adjusts parameters or data during runtime. (orient themselves and decide in which direction(s) to steer the “tour” through the data or parameter space, swiftly adjusting process settings at runtime)
- Explores multiple scenarios or branches.(branching-off alternatives & comparing their trajectories(multiverse analysis))
Expectations:
- Malleable process that is highly interactive.
Analogy: Physics simulations or “what-if” analyses.
- Requirements: Make the running progression as flexible/interactive as possible!
- Example: Flood simulation where users can adjust sandbag placements and water levels to see different outcomes.
Implementing Progressive Visual Analytics
Interaction Techniques
Pause/Unpause:
- Early Termination:
- Users can stop the progression when satisfied with the current result.
- Leaves behind “progressive guards” to alert if assumptions become invalid.
- Steering:
- Users adjust parameters or focus areas during progression.
- The system reprioritizes computation based on user interaction.
Visualization Strategies
- One Data Item per Mark:
- Scatter plots, node-link diagrams, and parallel coordinates.
- New data items are added as they are processed.
- Aggregated Visualizations:
- Histograms, bar charts, and binned scatter plots.
- Visualizations update aggregate properties (e.g., bar heights) without adding new marks.
Combination Approaches:
Software Architecture
Multi-threading Architecture:
Layered Processing:
Semantic Layers:
- Divide visualization elements (axes, data points, labels) into layers.
- Process and render layers progressively.
Incremental Layers:
- Process data in increasing detail or quantity.
- Early layers provide a rough overview; later layers refine the visualization.
Level-of-Detail Layers:
- Adjust the level of detail based on zoom level or screen space.
- Coarser details when zoomed out; finer details when zoomed in.
original text
Transient Visual Analytics
Concept
- Addresses limitations when data volume exceeds memory or screen space.
- Unlike PVA, which accumulates data towards completion, transient visual analytics continuously updates the view by:
- Adding relevant data.
- Removing less relevant data.
- The view remains intermediate but adapts to the user’s focus and interactions.
PVA VS. TVA
Advantages
- Always presents data pertinent to the current analytic task.
- Supports exploratory analysis where user interests evolve.
- Prevents overcrowding of the visualization.
Example
Dynamic Scatter Plot:
- As the user zooms or pans, data outside the view is discarded.
- New data relevant to the current view is loaded and displayed.
- Ensures the visualization remains clear and responsive.
Conclusion
- Visual Analytics and Progressive Visual Analytics are crucial for handling large and complex datasets interactively.
- Understanding the different roles (Observer, Searcher, Explorer) helps tailor systems to user needs.
- Implementing PVA requires careful consideration of interaction design, visualization techniques, and software architecture.
- Transient Visual Analytics represents a step forward, providing adaptive views that align with the user’s analytic workflow.
Note: These lecture notes are based on the professor’s presentation and include key examples and principles discussed during the lecture.
Quick Recap Quiz
The professor conducts a quiz using Menti (code provided during the lecture).
Question 1
When pre-processing text, turning the word “worst” into “bad” is done through:
Options:
- Stemming
- Normalization
- Lemmatization
- Noise removal
Answer: Lemmatization
Explanation:
Lemmatization
reduces words to their base or dictionary form (lemma).
- For example, “worst” becomes “bad”.
Stemming
simply chops off word endings to reduce them to a stem.
- It cannot turn “worst” into “bad”.
Normalization
handles non-standard spellings or elongated words.
- Example: “You’re so goooood” becomes “You’re so good”.
Noise Removal eliminates punctuation, smileys, and non-character tokens.
Question 2
The word cloud is a direct visual representation of:
Options:
- The bag of words
- The n-gram vector
- The inverse document frequency
- The HyperX Logominer
Answer: The bag of words
Explanation:
A
word cloud
visually represents the frequency of words in a
bag of words
model.
- Words are displayed in varying sizes based on frequency.
N-gram vectors consider the context of words (sequences of n words).
Inverse Document Frequency (IDF) is used in tf-idf to weigh terms that are common in a document but rare in others.
Hapax Legomena (not “HyperX Logominer”) refers to words that occur only once, used in literature fingerprinting.
Question 3
Which of these is *not* an approach to simplify line charts into sparklines?
Options:
- Extracting perceptually important points (PIPs)
- Sampling
- Piecewise Aggregate Approximation (PAA)
- Clustering
Answer: Clustering
Explanation:
- Sampling: Selects equidistant points from the line chart.
- Piecewise Aggregate Approximation (PAA): Divides the chart into segments and takes the average in each.
- Perceptually Important Points (PIPs): Selects points based on visual significance by recursively adding points with the most significant deviation.
- Clustering is not used for simplifying line charts into sparklines.
Question 4
Which of the following scenarios can be addressed by word-sized visualizations?
Options:
- Providing context for a textual statement
- Providing a color scale for a chart
- Providing textual labels on a map
- Providing a word cloud for quantitative data
Answer: Providing context for a textual statement
Explanation:
Word-sized visualizations
(e.g., sparklines) are tiny charts embedded within text to provide additional context.
- Example: Showing a stock price trend alongside the current price.
They are not used for color scales, map labels, or word clouds for quantitative data.
Question 5
What is literature fingerprinting?
Options:
- A pre-processing technique for text
- Pixel-based text visualization
- An improved word cloud algorithm
- A randomized greedy layout
Answer: Pixel-based text visualization
Explanation:
Literature fingerprinting
uses pixel-based visualization to represent large texts.
- Each section or word is represented as a pixel, allowing visualization of patterns in large documents.
Randomized Greedy Layout is associated with word cloud placement algorithms.