Post

DAVI Visual Analytics

DAVI Visual Analytics

Lecture Notes on Visual Analytics and Progressive Visual Analytics


Opening Remarks

  • The professor notes a record low attendance and humorously remarks:

    • “I think visual analytics will be one of my favorite subjects at the exam.”
  • Today’s lecture will focus on

    Visual Analytics

    for three hours.

    • Visual Analytics is a topic that could fill a whole semester.
    • It is closely related to current state-of-the-art research, including the professor’s own work.
  • The professor mentions that previous lectures covered foundational topics, but today will push into the forefront of what is known about visualization.


Visual Analytics

Introduction to Visual Analytics

  • Visual Analytics combines automated analysis with interactive visualizations to support understanding, reasoning, and decision-making from large and complex datasets.
  • It leverages the strengths of both computers and humans:
    • Computers excel at data storage, numerical calculations, and data mining.
    • Humans excel at cognition, creativity, and background knowledge.
  • Visualization acts as the communication channel between computers and humans(KEIM’S ABILITY MATRIX).

Desktop View

Definition

  • Visual Analytics combines the computational power of computers with human intuition to find effective solutions.
  • It aims to detect the expected and discover the unexpected in data.

Key Principles of Visual Analytics

  1. Appropriateness Principle: Desktop View
    • Provide neither more nor less information than needed to solve the problem.
    • Related to effectiveness in visualization.
  2. Naturalness Principle: Desktop View
    • Representations should closely match the information being represented.
    • Similar to expressiveness in visualization design.
  3. Matching Principle: Desktop View
    • Visual analytics must match the task to be performed.
    • Tailor visualizations to support specific tasks (e.g., comparison vs. localization).
  4. Apprehension Principle: Desktop View
    • The content should be accurately and easily perceived.
    • Avoid overly complex or embellished visuals that hinder understanding.

Examples of Visual Analytics

Example 1: Interactive Clustering

  • Users interactively refine clusters in a dataset (e.g., U.S. states).
  • By moving data points between clusters, the system updates clustering results in real-time.
  • The system learns from user interactions to adjust weights and improve clustering.

Key Points:

  • Combines computational clustering (K-means) with user input.
  • Allows for active learning and refinement based on domain knowledge.
  • Enhances understanding of how clusters are formed.

Example 2: PivotSlice for Faceted Data Exploration

  • An interactive system for exploring faceted datasets (e.g., academic publications).
  • Users can:
    • Search and filter data based on keywords.
    • Visualize relationships between different facets (e.g., authors, conferences, keywords).
    • Analyze citation patterns and publication trends.

Key Points:

  • Provides efficient faceted browsing capabilities.
  • Enables flexible analytical functions to reveal explicit data relationships.
  • Supports tasks like identifying influential authors or trends over time.

Progressive Visual Analytics

Motivation

  • Traditional visual analytics can suffer from long computational times, disrupting the user’s analytic flow.

  • Human Time Constraints:

    • Task Completion: ~10 seconds (acceptable for command-line interactions).
    • Immediate Response: ~1 second (acceptable for indirect interactions).
    • Perceptual Update: ~100 milliseconds (required for direct interactions).

Definition

Desktop View

  • Progressive Visual Analytics (PVA):
    • Progressive Visual Analytics (PVA) is a type of Visual Analytics where the computational analysis produces intermediate results as approximations of the final result, so as to fulfill the human time constraints imposed by the visual-interactive analysis and thus maintain the user’s analytic flow.
    • PVA can be (either) an incremental process that subdivides the data, processes it chunk-by-chunk, and thus yields intermediate results of increasing quantity / completeness – or it can be an iterative process that subdivides the computation, runs it step-by-step, and thus yields intermediate results of increasing quality / correctness.
  • It fulfills human time constraints by:
    • Providing early, meaningful partial results.
    • Continuously refining results as more data or computations are processed.

Types of Progressive Processing

Desktop View

  1. Incremental (Data Chunking):
    • Processes data in chunks.
    • Each iteration adds more data, increasing completeness.
  2. Iterative (Process Chunking):
    • Processes the full dataset but refines computations over iterations.
    • Each iteration improves the quality of the result.

Progressive Visualization Timeline

Desktop View

  • T_response: First meaningful partial result (e.g., initial layout in force-directed graphs).
  • T_reliable: Trustworthy partial results that allow for basic observations.
  • T_stable: Last significant partial result before the final result.
  • T_complete: Final result is achieved.

Various uses of the progression

  • Overcoming bottlenecks
    • algorithmic complexity
    • big data
    • slow network connection
    • combinatorial explosion of search space
  • Resource-conscious VA:
    • find good-enough solutions (reduce time to decision, power consumption)
    • adjust to available screen space (reduce network load, cognitive load)

Roles in Progressive Visual Analytics

Desktop View

1. Observer

  • Interest: Final result and understanding how it was achieved.

  • Benefits from PVA:

    • stay informed about the state of the computation (aliveness)
    • estimate how long it will take until it is done (progress)
    • watch the algorithm working and see the final result unfold (provenance)
  • Expectations:

    • Steadily improving outputs leading to a predictable end.
  • Analogy: Watching a loading process with a visual progress indicator.(loading process, fancy progress bar)

  • Requirements: Make the progression as stable as possible! (e.g. Pause/Play)
  • Example: Desktop View

2. Searcher

  • Interest: Trustworthy intermediate results for solving specific problems within a large information space (data, param, model)

  • Benefits from PVA:

    • fully interactive intermediate results allowing their visual exploration.
    • cancel the running process, saving the system a costly exhaustive search(early termination)
  • Expectations:

    • Stepwise process with useful early outputs.(PVA as step-wise process that produces constantly updated results which can be used in place of a final result for subsequent analytic operations)
  • Analogy: Using a search engine and not needing all results to make a decision.(internet search engine, product search)

  • Requirements: Make the progression as useful as possible as early as possible!

  • Example: Incremental querying of flight delay data to decide the best day to fly.

Desktop View

3. Explorer

  • Interest: Understanding data and processes through interaction with the progression.

  • Benefits from PVA:

    • Adjusts parameters or data during runtime. (orient themselves and decide in which direction(s) to steer the “tour” through the data or parameter space, swiftly adjusting process settings at runtime)
    • Explores multiple scenarios or branches.(branching-off alternatives & comparing their trajectories(multiverse analysis))
  • Expectations:

    • Malleable process that is highly interactive.
  • Analogy: Physics simulations or “what-if” analyses.

  • Requirements: Make the running progression as flexible/interactive as possible!
  • Example: Flood simulation where users can adjust sandbag placements and water levels to see different outcomes.

Desktop View

Implementing Progressive Visual Analytics

Interaction Techniques

  • Pause/Unpause:

    Desktop View

  • Early Termination: Desktop View
    • Users can stop the progression when satisfied with the current result.
    • Leaves behind “progressive guards” to alert if assumptions become invalid.
  • Steering: Desktop View
    • Users adjust parameters or focus areas during progression.
    • The system reprioritizes computation based on user interaction.

Visualization Strategies

  • One Data Item per Mark: Desktop View
    • Scatter plots, node-link diagrams, and parallel coordinates.
    • New data items are added as they are processed.
  • Aggregated Visualizations: Desktop View
    • Histograms, bar charts, and binned scatter plots.
    • Visualizations update aggregate properties (e.g., bar heights) without adding new marks.
  • Combination Approaches:

    • Visual Sedimentation: Desktop View
      • Combines individual data items with aggregate views.
      • Data items accumulate visually, resembling sediment layers.

Software Architecture

Desktop View

  • Multi-threading Architecture:

    • Separates event handling, visualization, and computation into different threads.
    • Allows for responsiveness and interaction during computation. Desktop View
  • Layered Processing:

    • Semantic Layers:

      • Divide visualization elements (axes, data points, labels) into layers.
      • Process and render layers progressively.
    • Incremental Layers:

      • Process data in increasing detail or quantity.
      • Early layers provide a rough overview; later layers refine the visualization.
    • Level-of-Detail Layers:

      • Adjust the level of detail based on zoom level or screen space.
      • Coarser details when zoomed out; finer details when zoomed in.
  • original text

    • Desktop View

Transient Visual Analytics

Concept

  • Addresses limitations when data volume exceeds memory or screen space.
  • Unlike PVA, which accumulates data towards completion, transient visual analytics continuously updates the view by:
    • Adding relevant data.
    • Removing less relevant data.
  • The view remains intermediate but adapts to the user’s focus and interactions.

PVA VS. TVA

Desktop View

Advantages

  • Always presents data pertinent to the current analytic task.
  • Supports exploratory analysis where user interests evolve.
  • Prevents overcrowding of the visualization.

Example

  • Dynamic Scatter Plot:

    • As the user zooms or pans, data outside the view is discarded.
    • New data relevant to the current view is loaded and displayed.
    • Ensures the visualization remains clear and responsive.

Conclusion

  • Visual Analytics and Progressive Visual Analytics are crucial for handling large and complex datasets interactively.
  • Understanding the different roles (Observer, Searcher, Explorer) helps tailor systems to user needs.
  • Implementing PVA requires careful consideration of interaction design, visualization techniques, and software architecture.
  • Transient Visual Analytics represents a step forward, providing adaptive views that align with the user’s analytic workflow.

Note: These lecture notes are based on the professor’s presentation and include key examples and principles discussed during the lecture.


Quick Recap Quiz

The professor conducts a quiz using Menti (code provided during the lecture).

Question 1

When pre-processing text, turning the word “worst” into “bad” is done through:

  • Options:

    • Stemming
    • Normalization
    • Lemmatization
    • Noise removal

Answer: Lemmatization

Explanation:

  • Lemmatization

    reduces words to their base or dictionary form (lemma).

    • For example, “worst” becomes “bad”.
  • Stemming

    simply chops off word endings to reduce them to a stem.

    • It cannot turn “worst” into “bad”.
  • Normalization

    handles non-standard spellings or elongated words.

    • Example: “You’re so goooood” becomes “You’re so good”.
  • Noise Removal eliminates punctuation, smileys, and non-character tokens.

Question 2

The word cloud is a direct visual representation of:

  • Options:

    • The bag of words
    • The n-gram vector
    • The inverse document frequency
    • The HyperX Logominer

Answer: The bag of words

Explanation:

  • A

    word cloud

    visually represents the frequency of words in a

    bag of words

    model.

    • Words are displayed in varying sizes based on frequency.
  • N-gram vectors consider the context of words (sequences of n words).

  • Inverse Document Frequency (IDF) is used in tf-idf to weigh terms that are common in a document but rare in others.

  • Hapax Legomena (not “HyperX Logominer”) refers to words that occur only once, used in literature fingerprinting.

Question 3

Which of these is *not* an approach to simplify line charts into sparklines?

  • Options:

    • Extracting perceptually important points (PIPs)
    • Sampling
    • Piecewise Aggregate Approximation (PAA)
    • Clustering

Answer: Clustering

Explanation:

  • Sampling: Selects equidistant points from the line chart.
  • Piecewise Aggregate Approximation (PAA): Divides the chart into segments and takes the average in each.
  • Perceptually Important Points (PIPs): Selects points based on visual significance by recursively adding points with the most significant deviation.
  • Clustering is not used for simplifying line charts into sparklines.

Question 4

Which of the following scenarios can be addressed by word-sized visualizations?

  • Options:

    • Providing context for a textual statement
    • Providing a color scale for a chart
    • Providing textual labels on a map
    • Providing a word cloud for quantitative data

Answer: Providing context for a textual statement

Explanation:

  • Word-sized visualizations

    (e.g., sparklines) are tiny charts embedded within text to provide additional context.

    • Example: Showing a stock price trend alongside the current price.
  • They are not used for color scales, map labels, or word clouds for quantitative data.

Question 5

What is literature fingerprinting?

  • Options:

    • A pre-processing technique for text
    • Pixel-based text visualization
    • An improved word cloud algorithm
    • A randomized greedy layout

Answer: Pixel-based text visualization

Explanation:

  • Literature fingerprinting

    uses pixel-based visualization to represent large texts.

    • Each section or word is represented as a pixel, allowing visualization of patterns in large documents.
  • Randomized Greedy Layout is associated with word cloud placement algorithms.

This post is licensed under CC BY 4.0 by the author.