DAVI Visual Analytics - redo
DAVI Visual Analytics - redo
1. Introduction to Visual Analytics
- Definition and Origin
- Visual analytics (VA) combines automated (computational) analysis techniques with interactive visualizations to support understanding, reasoning, and decision-making on large and complex data sets.
- Jim Thomas (often credited as a founder of the field) defined VA as âdetecting the expected and discovering the unexpectedâ by merging data mining/AI with human cognition and decision sciences.
- Why Visual Analytics?
- Human vs. Computer Capabilities
- Computers excel at: data storage, numerical calculation, data mining, AI.
- Humans excel at: cognition, creativity, common-sense reasoning, contextual knowledge.
- Visualization as Communication Channel
- Visual interfaces allow two-way communication between human and machine:
- Machine outputs partial or complete results visually.
- Human observes, interprets, applies domain knowledge, and interacts.
- Machine updates computations/outputs based on interactions.
- Visual interfaces allow two-way communication between human and machine:
- Human vs. Computer Capabilities
- Examples of Visual Analytics
- Interactive Clustering
- K-means example: The user drags states between clusters, thus actively teaching the system what âMidwestern Statesâ means. Real-time re-clustering + updated visual layout demonstrates VA in action.
- Document Retrieval & Exploration (Pivot Slice)
- A system that visually explores metadata (authors, venues, keywords) in scholarly publications. Users drag-and-drop keywords, conferences, or authors to form new queries without typing raw SQL. Interactive updates show re-groupings and citation patternsâagain illustrating VAâs blend of computation + direct user interaction.
- Interactive Clustering
- Guiding Principles (Parallels to General Visualization Principles)
- Appropriateness Principle
- Provide neither more nor less information than needed for the problem. Mirrors the effectiveness principle (no information overload / no loss of critical info).
- Naturalness Principle
- Representation should âmatchâ the data. Corresponds to expressiveness: use appropriate channels (size, position, color) that align with the dataâs scale (quantitative, categorical, etc.).
- Matching Principle
- Align the visualization with the task at hand. Similar to task-effectiveness in classic visualization guidelines.
- Apprehension Principle
- The visualization must be accurately and easily perceived (avoid chart junk, ensure clarity).
- Appropriateness Principle
2. Progressive Visual Analytics
- Motivation
- Traditional VA can stall if computations on large data or complex algorithms take too long. Users must wait for a âfinal result,â breaking their analytic flow.
- Progressive Visual Analytics (PVA) addresses this by producing meaningful intermediate results (i.e., partial approximations) quicklyâso the user can continue exploring without waiting for everything to finish.
- Fundamental Idea
- Human Time Constraints
- Empirical studies show:
- <100 ms needed for direct interaction feedback (e.g., moving a lens).
- <1 s for indirect interaction feedback (slider updates, query parameters).
- <10 s for command-line or âbigger operationâ style tasks.
- Delays beyond these thresholds harm analytic flow, reduce data coverage, and lower insight discovery rates.
- Empirical studies show:
- Progressive Computation
- Return draft / partial results quickly (e.g., in ~100 ms) so the user sees something.
- Keep refining in additional increments or iterations (seconds, tens of seconds).
- Potentially run all the way to a final âcompleteâ resultâunless the user terminates early because theyâve seen âenough.â
- Human Time Constraints
- Two Principal Modes of Progression
- Incremental (âData Chunkingâ)
- Gradually include more data (e.g., 10%, 20%, ⊠100%). Quality is similar, but quantity of data grows with each chunk.
- Iterative (âProcess Chunkingâ)
- Use the entire dataset from the start, but refine the algorithmic iterations step by step. Early iterations produce crude partial layouts/results; further iterations refine quality.
- Incremental (âData Chunkingâ)
- Stages of Progressive Results
- tresponset_\text{response}tresponse: First meaningful partial result (~100 ms). Basic shape, rough draft of the data so the user sees âsomething.â
- treliablet_\text{reliable}treliable: Mature partial result. Enough data/iterations to spot patterns or stable clusters.
- tstablet_\text{stable}tstable: Last significant partial result. No major changes in the visual structure; further computation only fine-tunes.
- tcompletet_\text{complete}tcomplete: Final output if we let it run to the end. In practice, often not needed or is identical to stable.
3. Usage Scenarios (âRolesâ) in Progressive Visual Analytics
- Observer
- Wants the final result but appreciates âwatching it formâ to build trust and understanding that itâs still alive (vs. a black-box).
- Typical analogy: A fancy loading bar.
- Key interaction: Pause/Play. They might pause at an intermediate stage to examine how the layout (or partial computation) has progressed, then resume.
- Searcher
- Seeks a trustworthy intermediate result âgood enoughâ to solve a simpler question (e.g., âWhich day should I fly to avoid delays?â).
- Analogy: Search engines (Google, Amazon). We rarely scroll to page 20 if page 1â2 already suffices.
- They rely on quick partial results to prune or focus the search space. Can âterminate earlyâ once they have enough info.
- Explorer
- Less about final or partial results, more about understanding the process itselfâoften performing what-if scenarios or parameter sensitivity analyses.
- Analogy: Physics or Flood Simulations, branching multiple scenarios. E.g., âWhere to place sandbags during a possible flood?â Then interactively tweak water flow, dam placement, etc.
- Needs maximum flexibility to parametrize and steer the algorithm on-the-fly.
- May âbranch offâ parallel progressions (âmultiverse analysisâ) and kill the uninteresting ones.
4. Interaction Design in Progressive VA
- Pause / Resume / Jump
- Pause visualization update, but often let computation continue behind the scenes. When resumed, the view âjumpsâ to the latest refined state.
- Potential âalert overlayâ (heatmap) indicating regions in the paused screen that are now out-of-date (less trustworthy).
- Early Termination & Progressive Guards
- Early Termination: âI have enough insightâstop refining.â Computation may still run in the background to detect if your assumptions were wrong.
- Progressive Guards: âIf a future chunk violates assumption X, alert me.â This is akin to optimistic execution in CPU pipelines.
- Steering
- Biased Sampling: Once the user zooms into a region, the system prioritizes sampling more data there, reducing sampling elsewhere.
- Dynamically re-weights or re-focuses the data ingestion based on user interactions.
5. Visualization Design in Progressive VA
Simple âOne Item = One Markâ
- Common with scatterplots, parallel coordinates, etc. As new data arrive, new points or polylines appear. Straightforward incremental updates (add or remove marks).
Aggregation-Based
- Bar charts, heatmaps, bin scatterplots. The number of visual marks is fixed, and new data only update the heights, colors, or counts.
- Excellent for overviews, but individual outliers can get lost.
Hybrid Approaches
- Visual Sedimentation: new data âdrops inâ like sediment. Maintains a sense of âlive updatesâ while still aggregating.
Layering Architecture
Break down rendering into conceptual
layers
, each computed in sequence or in parallel:
- Core background or axes
- Main data items (rough)
- More refined data items or sub-clusters
- Additional details like labels (often computationally expensive)
Each layer can be independently invalidated and re-rendered upon user interaction, avoiding re-computing everything from scratch.
Multi-Threading & Event Handling
- Typically, one event-handling thread + separate visualization (or data-processing) threads per layer or view.
- Ensures that long-running computations never block interactive responsiveness.
6. Transient Visual Analytics (Next Frontier)
- Limitation of Standard Progressive VA
- Progressive VA accumulates more and more data until complete (or until you stop).
- But for very large data, we may run out of screen space or memory.
- Key Idea
- Transient Visual Analytics also removes data that is no longer relevant to your current focus.
- The view never truly completesâit remains always intermediate, always dynamically adapting to the userâs shifting interests.
- Think of the screen as a cache: newly relevant data arrive, older or âless relevantâ items get evicted.
- Benefits
- Supports exploratory workflows where you do not know in advance where youâll zoom or what patterns youâll chase.
- You can keep focusing (zooming/panning) on new sub-regionsâwhile older data vanish from display.
- Example
- Dynamic Scatterplot: As you zoom into a cluster, data outside that cluster are dropped (freed space). Zooming back out re-fetches those points if needed.
- The result is a forever interactive, forever intermediate visualization that stays uncluttered, focusing only on the userâs current line of inquiry.
7. Closing Notes
- Where We Stand
- Visual Analytics as a domain has about two decades of research maturity.
- Progressive Visual Analytics is a newer âbranchâ actively studied to handle big data and complex algorithms under stringent time constraints.
- Transient Visual Analytics is an emerging area pushing beyond âalways adding dataâ to âalso dropping dataâ to better accommodate memory, screen-space, and shifting exploration paths.
- Practical Takeaways
- Designing VA Systems requires carefully balancing computational power and human cognition: use incremental or iterative partial results, offer interactive controls (pause, early termination), and choose visualization techniques (one-mark-per-item vs. aggregated) that best fit the task and user role.
- Real-World Scenarios (flood rescue, cluster definition, search and rescue tasks, large-scale text or citation analyses) benefit from progressive or transient strategies to keep analysis fluid and decision-focused.
In Summary
This chapter traces the lineage from basic VA principles through the motivation, core mechanisms, and usage scenarios of Progressive Visual Analytics, culminating in the emerging concept of Transient Visual Analytics. The overarching goal is to allow analysts to act, react, and steer data exploration seamlesslyâeven when the underlying data or algorithms are too large or too slow to process in one shot.
This post is licensed under CC BY 4.0 by the author.