DAVI Visualization Design

Posted Nov 11, 2024 Updated Dec 2, 2024

By Wei Xiong

13 min read

DESIGN PROCESSES & DATA/TASK ABSTRACTION

Overview

WHAT IS DATA?

Information is higher level abstractions of that data that I can extract from it.

eg. So for example distribution over data to see. Are there trends. Are there clusters. Are there outliers. That would be information that I derive from a lot of observed or measured data that I have gathered before relations between data.

On top of information since knowledge. Knowledge is then taking these distributions and relations and builds them into functional dependencies that you can work with models, for example, simulation models, statistical models that you can use to explain.

eg. The distributions and relations you found in the underlying data parameters. Influencing factors, controls, whatever it is. Limitations, boundary conditions. Yeah. So this would all be knowledge that’s been derived from the information.

Wisdom: this is where you actually use these models, this knowledge that you have to make predictions, to get something actionable, make prognoses.

Know What: We would just want to describe the data. Oh it’s it’s distributed in the following way. Uh there’s a correlation and such and such way. And there’s no interpretation yet. Just kind of trying to find out the patterns that are apparent in the data, the phenomena in the data.
Know Why: I have these observations I want to know why and when do they happen. And I want to build my models based on that causal knowledge.
Know How: I want to take these models and use them to help me make better decisions in real life.

VISUALIZING DATA VS. INFORMATION

The Data Space

Data context (what kind of data we want) is also what is sometimes called the data domain, or the reference space, or the independent variables.

These are the variables that give you the frame of reference for any measurement observation being made.

if I take a temperature measurement. That temperature that I measure that would be data content.
But where I measure it when I measure it. For how long I measure. That would be part of the data context, because that allows me to relate different measurements with each other.
- geospatial or spatial, very often temporal, but it can be other aspects.
- another eg. CPR number if I have patient data in a clinical data set, uh then my independent variable, my data context would be the CPR number. That helps me to tell, uh, the different observations apart.

Data content. The data content is what’s being observed, what’s being measured. So times call them as reference space or dependent variables(因为是被data context所决定的–)

DATA CONTEXT

THE DATA CONTEXT – EXTENT

One important property of the data context is the extent of that context, because that extent will differ or plays a major role in.

the extent of the data context determines the validity of measurements, and this can vary across space, affecting how we interpret and visualize the data.

summary by AI

Extent of Data Context: The extent refers to the applicability and influence area of the measured data. This extent can vary and plays a key role depending on the context.

Pointwise Extent: Some measurements are only valid at specific points, and beyond those points, the results may no longer be accurate. For example, a temperature measured at a particular point is only valid within a small surrounding area.

Narrow and Continuous Measurements: For instance, temperature measurements can remain valid over a certain distance (like a few meters), depending on the situation. If the temperature gradually changes or stays constant across a space, it could be considered “narrow continuous,” meaning the data is continuously valid within a certain range.

Broad Extent Measurements: For example, measuring temperature in different offices. Each office might have different temperatures, and the validity of the measurement ends at the boundary of the room. This scenario is common in indoor environments, where each room has its own context of measurement validity.

Incomplete Data: In practice, you rarely have continuous data but rather discrete readings taken from sensors placed at specific points. Based on these discrete readings, you infer or extrapolate the situation over a broader area.

THE DATA CONTEXT – INTERPOLATION

A broad sweep of temperatures that gives you a whole field of values continuously that you can use

Parameters: K and P

增加 k 会使插值更平滑，因为考虑的点更多。

增加 p 会强调局部数据，使插值更加依赖于附近的点，可能导致数据呈现出更为尖锐的变化

THE DATA CONTEXT – TOPOLOGY

We don’t only have continuous data. So, there is a discretization(离散化). We can also use the graph topology to build the data context.

DATA CONTENT

THE DATA CONTENT – STRUCTURE

And that’s the most common case. We can have a scalar data structure. That means for each observation we only have one value one magnitude.

eg. Scalar data structure.

eg. Vector data: Values making up an observation. We often have that for wind speed and wind direction. One without the other doesn’t really make a lot of sense.

eg. Tensor data(张量数据): usually on a grid, they measure, in this case the flow in eight different directions. You get this three dimensional way of seeing how this dissipates(涣散) and flows in, in all kinds of directions.

WHAT TO GAIN FROM DATA ABSTRACTION?

the moment you actually break it down into quantitative, categorical, ordinal, uh, and so forth, you will see, well, actually I could reuse that diagram and show this and that would also match the other customers. And yes, we already mentioned it helps you to build visualizations that are actually expressive.

THE VISUALIZATION PROBLEM: TASK ABSTRACTION

Example:

We like to distinguish between unipolar and bipolar tasks.

Bipolar tasks are tasks that have a natural opposite.

IDENTIFY TASK VS. LOCATE TASK

Identification means to find the value of a particular data point. I can identify the value to that data point and then gives.

Localization where you want to find data points of a particular value. So it’s the other way around. You have a particular value given. And then you want to find all the data points that exhibit this value.

Summary:

One is looking at the data point or the mark in your visualization and finding out the value to it. 根据点找值
The other one is taking a value and finding all the marks in your visualization that exhibit similar values. 根据值找点

Task Taxonomies

THE TASK FRAMEWORK BY ANDRIENKO

Identify Task: 根据点（给定的那一天）找值（苹果公司的股价是多少）

Locate Task: 根据值（给定苹果公司的股价是200）找点（哪一天是200）

R = Data Context

C = Data Content

More Task

Comparison Task and Inverse Comparison

S是什么：To find out the relation between them. And there is more relation seeking tasks. So they couldn’t call the relations R as it would otherwise be natural.

S2中的关系最为复杂，要找出同时满足所有的数据项

Synoptic and Elementary Task

Elementary tasks: If they are asking for individual data items that is water and drink or undrink.

Synoptic tasks: **If we are working on **sets of data items.

Task1: It’s no longer asking for an individual data item, but it’s asking for a trend, and it’s looking for this trend in a set of data items.

Summary

WHAT TO GAIN FROM TASK ABSTRACTION?

You can map that in your internal way to an abstract task and with that, to a visualization that supports that task. Also use it again for generalizing not only the problem but also the solution.

VISUALIZATION DESIGN: DESIGN ACTIVITY FRAMEWORK

VISUALIZATION DESIGN: 5 Design sheet method

1 - The Brainstorm Sheet

2 - The Alternative Designs

3 - The Design Realization

eg. pdf 37 - 38

four guiding

EVALUATION OF VISUALIZATION: MUNZNER’S NESTED MODEL

The beauty of this approach is that it connects directly back to our steps from the design activity framework.

The understand step is all about taking the domain situation and turning it into a data and task abstraction.

Critique

case - 1

“nominalize”（正常化）是指将数据转换为相对值或百分比形式，以便更容易进行比较，而不是仅仅使用绝对值。

Yeah. So when I say numbers over geospatial areas should always be normalized.

but opposite example:

if you have data given, let’s say, uh, uh, Covid cases. If these were Covid cases and you want to compare right, then we’d say, okay, you need to normalize by the population density because, 1.7 million Covid cases means something very different than the US, then it means in Cuba, right

Absolute Numbers Are Crucial for Resource Allocation(Some time we don’t need the nominlization)

When managing a pandemic, absolute case numbers are critical for determining the necessary resources:
- For example, a city with 1.7 million active cases needs significantly more ICU beds, ventilators, and healthcare workers than a smaller city with the same per capita infection rate.
- Population-normalized rates may show that a smaller country like Cuba has a higher rate of infection per 100,000 people, but this doesn’t help assess the immediate need for medical supplies in a country like the U.S., where the absolute number of cases overwhelms the healthcare system.

case1 - critique

图表类型错误或使用不当：

图表使用的可能是平行集（Parallel Sets），这种图表通常用于显示不同类别之间的交叉相关性。
如果数据更适合简单展示总数的分解，那么堆叠柱状图可能是更好的选择。因为它能清晰地展示各部分对总体的贡献，而不强调类别间的交互影响。

灾难与冲突的分类问题：

图表将灾难和冲突视为完全分离的两个类别，这在现实中可能并不总是清晰划分。例如，一个地区可能同时发生冲突和自然灾害。
武装和社区冲突被视为不相交的类别，但实际上，这两种冲突形式在实际情况中可能互有交叉。

顶级类别的不一致性：

数据显示了大多数类别的前五名和其他，但在冲突类型中，只显示了前四名，缺乏一致性，这可能会误导信息的接收者。

冲突数据的不完整性：

关于冲突的数据通常不完整或不准确，因为在冲突地区收集准确的数据非常困难。这种数据的不确定性应该在图表设计中得到体现，确保信息的接收者理解数据可能存在的局限性。

Improve

简洁的数据展示：

这些图表使用简单的堆叠柱状图来直接展示每个类别的数据。这种方式容易理解，并直观地显示了各个国家或原因导致的流离失所人数。

清晰的类别区分：

通过将“灾难”和“冲突”分开，并且各自细分原因和国家，观众可以清楚地看到哪些国家和哪些原因导致了最多的流离失所。这种布局避免了之前图表中的混淆，特别是在灾难与冲突数据的交叉分析方面。

统一的表示方法：

每个图表都清楚地标出了各自的顶部因素，如前五个最影响大的国家和原因，以及“其他”类别，这有助于比较和对比不同类别。
图表之间的一致性（如颜色和设计）提升了整体的可读性和专业性。

增强的数据对比：

将数据按国家和原因分开展示，使得对比更为直观，观众可以立即识别出造成最大影响的因素和地区。

避免误导：

每个图表都集中在一个特定的数据集上，避免了过于复杂或可能误导的数据表达。观众不需要解读复杂的交叉关联或隐藏的数据层次，从而减少了解释错误的风险。

case - 2

Data Visualization, Lecture

This post is licensed under CC BY 4.0 by the author.