Post

DAVI Basic Charts

DAVI Basic Charts

Conception

tick mark(刻度线)是指坐标轴上的小标记或线条

arsenal: 武器库

benchmarks:基准

Scatter Plot

It is used for two attributes, showing the strongest channel.

If you want to see a trend, use that chart. If you want to see outliers, use another chart. If you want to see clusters, use yet another chart. Scatter plots can do all of this at once.

Scatter Plot Image

Does a scatterplot make sense?

image-20240910165329522

we have not a huge spread here on the cylinders angle because it’s a discrete attribute.

use box whiskers show actual distribution of the dots

image-20240910165612225

zoom level

it determine the impression of data on the reader.

image-20240910165754706

The moment you zoom in. Well, yeah, there is still a correlation, but it’s not as strong anymore as you would read it.

zoom out -> cluster

don’t zoom out too much to show that only a strong correlation can be read off, when maybe there’s actually quite some deviation among the dots.

how to place the tick marks(刻度线) on the axes?

  1. a lot of whitespace. hard to read off
  2. no way to gauge x-axis
  3. offset (偏移量)is strange -> 9
  4. best

image-20240910171251957

Should you add gridlines in the background

gridline are help line -> guide us in reading values of that are further away from the axis.

rule of thumb(经验):

  • introduce the gridline to find the position of a point as accurately as possible.
  • If you want to support a reading task, an identification task is not important

Quadrants or average lines 象限或平均线 ?

image-20240910172319118

  • crowded chart by subdividing it in these four quadrants so it chunks the chart to a technique that is well known from,
  • people who who investigate memory and how to remember things better.

chunking(定量分块):they chunk it up into smaller strings and learn them individually.

trend line

image-20240910173712106

looking for the trend and then maybe looking for how the different dots deviate from that trend

Why would we want to have little rectangles or squares like they are used here?

image-20240910173827400

  1. If there is a secondary attribute. A categorical one, you can, um, uh, use the shape to show to which category each dot belongs.
  2. 为什么是矩形而不是圆形:当测量温度的时候,通常会给你一个20度上下浮动的数值,可能是20加减0.1,此时这个矩形可以表示一段范围。

Overlap

ADD TRANSPARENCY

image-20240910175429182

ADD JITTERING

image-20240910175510554

add very small random offset to the position. dont overdo it.

ADD BINNING

image-20240910175904105

Splatter plot

image-20240910181241135

more continuous

they can easily intersect

the sparser regions where there are the outliers, they just leave the outliers as they are and show them as individual points.

Heat map

image-20240910184405801

need to normalize by the population density.

  • Why is that? Because otherwise each heatmap you create will look the same.
  • Simply because there are more people in the big cities on the coasts in the US.

image-20240910184731789

everything will look the same, because of population density.

Line chart

First question to ask yourself is always does it make sense?

image-20240910185412511

image-20240910185423841

Trend line

  • What you want to go for is to choose the aspect ratio so that you have if you are connecting the first or you are plotting in the trend line
  • let’s put it like that. Then that trend line sits at around 45 degree angle.

image-20240910185647733

gridline

same as Scatter plot

special line chart

image-20240910190117201

  1. like a linear connection which amounts to a linear interpolation between the two measurement values you are connecting with.
    • 一个在值 A 处,另一个在值 B 处。如果您想估算这两个点之间的值,线性插值法假定这两个点之间的值呈直线变化。例如,如果您要从 A=1 过渡到 B=3,那么假设间距相等,在这些点之间进行线性插值,就会得到 1.5、2、2.5 等值。
  2. Sometimes you also have this form of a chart step chart where basically one value is true until the next value comes in.
    • 每个值之间不是平滑过渡,而是在到达下一个值之前保持不变。假设数据点分别位于 1、2 和 3。在阶梯图中,数值会停留在 1,直到到达 2,然后立即 “阶梯 ”到 2,停留在那里直到到达 3。没有平滑的变化,只是从一个值跳到下一个值。

spaghetti plot fluctation

image-20240910190540949

because the lines that have a lot of fluctuation, they are going to cover a lot of pixels. If I had drawn this smoother line here. First, they would completely over plot it. But the smoother line does not over plot as much from the wild up and down lines being shown.

noisy data

image-20240910191048058

just tell them the truth

envelop

image-20240910191616046

如果line chart中噪声很大,那么可以直接展示每个节点的最大值和最小值,形成一个envelop来包围住它。

Scatter plot or Line chart?

So we usually say. The less noise and the clearer the trend, the more line charts are recommended.

image-20240910192803424

connected scatterplot

image-20240910193146331

两个定量变量: 这幅图追踪了每年的两个变量–油价和人均行驶里程。图中的每个点代表特定年份的一对数值。

有连接点的散点图: 这不是一个静态的散点图,而是按时间顺序(在一段时间内观察到的顺序)连接各点。这让人感觉到两个变量之间的关系随时间的轨迹或演变。

随时间演变: 关键在于,通过不同时期的轨迹(例如从 2000 年到 2008 年),您可以看到油价和行驶里程之间的关系随着时间的推移发生了怎样的变化。

Area chart

a quantity that stacks up.

for example not use it for temperatures and we would not use it for uh grades uh, that you earn in a class or something.

Simply because they are not amounts

ordinary and stacked

image-20240910194058030

Exmaple:

Ordinary area chart:

  • three different classrooms. Then I measure the number of seats in each classroom. The percentage of seats that are taken or something like that. Then I would use the ordinary area chart. Showing me. If I look at the top line, which we call the silhouette. That top line always gives me the maximum at any point in time.

Stacked area chart:

  • **sum up the number of students in each of the three lecture halls ** to see, okay, how many students were over time in all three lecture halls. And, uh, that I would then show as such a stacked area chart.

COGNITIVE DISTORTION

image-20240910194859585

same height of each line

image-20240910195033768

image-20240910195209956

cognitive illusion

that’s why we like our layers in the stacked area chart to be as horizontal as possible.

THE BAR CHART

区别在于条形图(bar chart)通常是水平的,而柱形图(column chart)是垂直的。

We usually call them column charts.

The ones that are horizontal these are called bar charts or the nitpicker among you.

just call them bar chart.

is it make sense?

image-20240910200402621

decide by the number of bar they have

more bars you have, the more you get into the area of probably using a line chart or something like this.

divide the chart up

image-20240910200533730

because of these two large values and we are not allowed to cut the y axis, they squish these small bars into.

And then you use perspective distortion to indicate that you are now moving onto a different scale.

Which aspect ratio to choose

Which aspect ratio to choose? >= 10 : 1

image-20240910200810180

How far to space the bars? 50%-75%

image-20240910200856064

order

  • ordered them once alphabetically here, and I ordered them by value.Here. Both can be used depending on the reading task that you want to support.

image-20240910201005353

GROUPED AND STACKED BAR CHARTS

image-20240910201205616

THE HISTOGRAM

We have a continuous quantitative. Variable on the x axis.

the GAP is not something you can buy go by when I show you a visualization to determine whether this is a histogram or not.

THE HISTOGRAM – BIN SIZES & OFFSET

one discerning feature of bar chart vs histogram:

what’s on the x axis is it categorical variable or is it a quantitative variable.

image-20240910201642483

虚线代表平均值

That’s the offset of where do you want your bins to start and to end.

image-20240910202627724

critiquing point: THE HISTOGRAM – BIN SIZES

image-20240910202722346

POLAR / RADIAL / CIRCULAR VARIANTS

let’s say a wind direction in degrees.

image-20240910204202696

image-20240910204342925

image-20240910204419586

VIS CRITIQUE #1

example 1

image-20240910211423530

根据教授的讨论,我将从表现力(expressiveness)、有效性(effectiveness)和效率(efficiency)三个方面对原始图表(图1)进行批评,然后根据图2提出改进建议。

image-20240910212502767

表现力 (Expressiveness):

  • a) 数据混淆:标题和副标题之间存在矛盾。标题提到”Gun deaths in Florida”(佛罗里达州枪支死亡人数),而副标题则说”Number of murders committed using firearms”(使用枪支犯下的谋杀案数量)。这两个概念并不相同,可能导致读者误解数据的实际含义。
  • b) 不恰当的插值:使用面积图和连接数据点的线条暗示了年度之间的线性插值,这对于年度数据来说是不合适的。每年的死亡数字是独立的,不应该用线性方式连接。
  • c) 颠倒的数据映射:图表从上到下绘制,与传统的从下到上增长的方式相反,可能导致读者误读数据。

有效性 (Effectiveness):

  • a) 误导性的白色空间:由于面积图的颠倒使用,读者可能会错误地将白色空间理解为数据,而不是红色区域。
  • b) 标题位置:x轴位于底部,与负向数据的展示不符,增加了理解难度。
  • c) 颜色选择:虽然红色在语义上与死亡相关,但大面积使用可能引起不必要的情绪反应。

效率 (Efficiency):

  • a) 创建简单:虽然这种图表相对容易创建,但可读性较差。
  • b) 解读困难:由于上述问题,读者需要花费更多时间和精力来正确解读数据。

改进建议(基于图2中的伊拉克死亡人数图表):

  1. 使用条形图代替面积图,避免不必要的数据插值
  2. x轴移至顶部,符合负向数据的直观展示方式。
  3. 使用圆角设计条形图顶部,进一步降低误读白色空间为数据的可能性。
  4. 保持红色主题,但考虑使用不同深浅的红色来减少视觉冲击。
  5. 明确区分并澄清”gun deaths”和”murders committed using firearms”的区别,选择一个准确的描述并在整个图表中保持一致。
  6. 保留数据点标记,但移除连接线,以避免年度之间的插值误解。
  7. 考虑添加更多上下文信息,如图2中展示的其他相关数据或解释性文本。
  8. 优化标题和说明文字的位置,使其更加醒目且易于理解。

通过这些改进,图表将更准确地表达数据,提高可读性,并减少误解的可能性,同时保持其引人注目的视觉效果。

image-20240910211515614

example 2

image-20240910212220885

从表现力(expressiveness)、有效性(effectiveness)和效率(efficiency)三个方面对原始图表(图1)进行批评,然后根据图2提出改进建议。

表现力 (Expressiveness):

  • a) 时间顺序错误:图表将时间数据作为分类数据处理,按COVID-19病例数从高到低排序,而非按时间顺序排列。这违背了时间数据的本质,可能误导读者认为病例在减少。
  • b) 缺乏数据归一化:图表展示的是原始病例数,没有考虑各县人口差异,无法进行公平比较。
  • c) 组内顺序不一致:在每个时间点,各县的顺序不一致,增加了跨时间比较的难度。

有效性 (Effectiveness):

  • a) 比较任务:柱状图适合比较任务,但由于时间顺序和归一化问题,难以进行有意义的比较。
  • b) 标签处理:使用了错开的水平标签,避免了倾斜标签难以阅读的问题,这是一个优点。
  • c) 对比度:柱子周围使用了浅色边框,增加了与背景的对比度,提高了可读性。

效率 (Efficiency):

图表创建简单,没有明显的效率问题。但由于存在的表现力和有效性问题,读者需要更多时间来正确解读数据。

改进建议(基于图2):

  1. 恢复时间顺序:将x轴改为时间轴,按照日期顺序排列数据。
  2. 分组展示:按县分组展示数据,每个县使用单独的小图表。
  3. 保持一致的y轴范围:确保所有小图表使用相同的y轴范围(0-200),便于比较。
  4. 添加上下文信息:在图表上方添加标题和简要说明,解释数据表示的内容和时间范围。
  5. 考虑数据归一化:如果可能,应考虑按人口比例归一化数据。
  6. 改善颜色方案:使用更易区分的颜色来代表不同的县。
  7. 添加网格线:增加水平网格线,便于读者准确读取数值。
  8. 保持一致的柱子顺序:在每个时间点保持相同的县顺序,便于跨时间比较。
  9. 考虑添加数据表:在图表下方添加一个数据表,显示具体的死亡和住院人数。
  10. 优化图表间距:确保每个小图表有足够的空间,避免柱子过于密集。

通过这些改进,图表将更准确地表达数据,提高可读性,并为读者提供更全面的信息,同时保持了原始数据的完整性。

image-20240910212351027

image-20240910212438140

This post is licensed under CC BY 4.0 by the author.