DAVI Visualizing Geospatial Data - redo
DAVI Visualizing Geospatial Data - redo
Lecture Notes
Below is a comprehensive set of notes based on the professorâs lecture.
All key points, examples, and explanations have been included in detail.
1. Announcements
- Project Presentations Sign-up
- The sign-up for project presentations is now open.
- Students are invited to present their project in the upcoming TA sessions (either next week or the week after).
- Purpose:
- Showcase progress
- Receive feedback from the professor and peers
- Exam Sign-up (Phase 2)
- Announcement on Brightspace will follow.
- If you have any scheduling conflicts or other obligations during the exam period, send an email to the professor.
- You will receive the sign-up link earlier to secure a suitable time slot.
- Office Hour Cancellation
- No office hour today because the professor has another lecture to give (the âPre-Talent Trekâ for bachelor students).
- This is a one-time event; next weekâs office hours will return to normal.
- Todayâs Short Breaks
- The professor needs to lecture for 5 hours today and must travel to another campus building after class.
- He requests 10-minute breaks so the class can finish 10 minutes earlier, allowing him travel time.
- Plan for Today
- Organizational matters are settled.
- Before moving into the main topicâGeo Visualizationâthere will be a quick recap quiz on temporal data visualization (as is done most Mondays).
2. Recap Quiz (Time-Oriented Data)
The professor used Menti (shown on the screen) to pose several multiple-choice questions. Below are the questions, answers, and the professorâs elaborations.
Q1. Horizon graphs are used to show:
- Options:
- Multiple events
- Multiple time intervals
- Multiple time granularity
- Multiple time series
- Correct Answer: Multiple time series
- Explanation & Examples:
- Horizon graphs are particularly good when you have many time seriesâlike stock prices over time or weather data (e.g., temperature for multiple cities).
- For multiple time granularity, a calendar visualization is more suitable.
- For multiple time intervals, youâd typically use triangular models or Gantt charts.
- For multiple events, the professor showed an example called the time map (e.g., analyzing distances between consecutive Twitter events).
Q2. Which one is not a time arrangement?
- Options:
- Logarithmic time
- Cyclic time
- Linear time
- Branching time
- Correct Answer: Logarithmic time
- Explanation & Examples:
- Linear time: The standard timeline concept, e.g., from the earliest to the latest date.
- Cyclic time: Illustrated by a clock face or repeated seasonal patterns.
- Branching time: Familiar to computer scientists from version control (e.g., Git branching).
- âLogarithmic timeâ was simply made up; itâs not a standard conceptual arrangement for time in data visualization.
Q3. Identification of the chart (showing intervals as points in a triangular shape):
- Options:
- Ternary plot
- Triangular model
- Population pyramid
- Tri-linear plot
- Correct Answer: Triangular model
- Explanation & Examples:
- Triangular model (or triangle chart for time intervals) represents intervals as points instead of lines (like the Gantt chart does).
- A ternary plot is a three-variable scatterplot (often summing to 1 or 100%).
- A population pyramid is used to display population by age brackets (often split by gender, with symmetrical bars around a central axis).
- A tri-linear plot is another term related to Piper diagrams but not what was shown here.
Q4. Which is not a principal way of mapping time in visualization?
- Options:
- Mapping time to space
- Mapping time to pixel
- Mapping time to time
- Using small multiples
- Correct Answer: Mapping time to pixel
- Explanation & Examples:
- There are two main principles for temporal data in visualization:
- Mapping time to display space: e.g., small multiples (a small chart for each time slice).
- Mapping time to display time: e.g., animation (the data changes as time progresses).
- âTime to pixelâ is not a standard principal approach.
- There are two main principles for temporal data in visualization:
Q5. Spiral plots are used to identify:
- Options:
- Sequences
- Trends
- Periodicities
- Intervals
- Correct Answer: Periodicities
- Explanation & Examples:
- Spiral plots allow you to see repeating or cyclic behavior, e.g., monthly or seasonal patterns.
- By adjusting the rotation so that one turn of the spiral equals a specific period (e.g., 4 weeks), repetitive cycles in the data become visible.
- Sequences are often spotted by arc diagrams, which connect repeated patterns or events in the timeline.
- Intervals might use Gantt charts or triangular models.
3. Transition to Geo Visualization
3.1 Why Geo Visualization?
- Historic Origins: Cartography is considered one of the earliest forms of data visualization.
- The concept of marks (points, lines, areas) originates from cartography:
- 0D = Points (e.g., cities)
- 1D = Lines (e.g., streets, rivers)
- 2D = Areas (e.g., countries, oceans)
- The concept of marks (points, lines, areas) originates from cartography:
- Many fundamental visualization theory concepts (marks, aesthetic channels, etc.) were inherited from thematic cartography.
- Misrepresentations can occur if maps are generated without understanding projections or the underlying geospatial context.
- E.g., the classic distortions from the Mercator projection (Greenland or Russia appearing huge compared to Africa).
Example: Election Maps
- U.S. Presidential Elections often show large red areas vs. small blue areas, misleading because area â population.
- The modifiable areal unit problem: When you color large low-population counties the same as small high-population counties, it can create a skewed visual impression.
- Cartograms can distort areas proportionally to data (e.g., population), providing a more âtruthfulâ representation of certain data aspects, but sacrificing strict geospatial fidelity.
4. Spatial Statistics (Brief Overview)
When dealing with latitude/longitude or any (x,y) coordinates, certain descriptive statistics change:
- Spatial Mean
- Compute the mean of x-coordinates and the mean of y-coordinates separately.
- Weighted means also possible if points have varying importance.
- Spatial Median
- An optimization problem: find the point (among existing points) that minimizes sum of pairwise distances to all other points.
- Distances
- Euclidean distance (âas the bird fliesâ) vs. Manhattan distance (city grid constraints).
- Standard Distance
- Analogous to standard deviation in 2D. Produces a standard distance circle around the mean location.
- Standard Deviation Ellipses
- More nuanced than a circle. Shows directional spread (points might be more spread in one direction vs. another).
5. Map Projections
5.1 Overview
- A globe is 3D, but maps are 2D â Projections inevitably cause distortion.
- Mercator Projection:
- Uses a cylindrical âtoilet paper rollâ concept around the Earth.
- Preserves angles (useful for navigation), but grossly distorts area (Greenland, Russia, etc.).
- Web Mercator is used by many web-based mapping services (e.g., Google Maps) precisely because it preserves angles (straight roads stay visually straight).
- Lambert Projection:
- Preserves area (called an âequal-areaâ projection).
- Distorts shapes/angles (circles in the Tissotâs indicatrix become ellipses).
- Azimuthal (e.g., Equidistant) Projection:
- Preserves distance from a central point, but not other distances or areas.
- Compromise Projections (e.g., Robinson Projection):
- Try to minimize all distortions (area, angle, distance) but cannot completely preserve any single property perfectly.
5.2 Tissotâs Indicatrix
- Placing identical circles on the globeâs surface and seeing how they become distorted in the 2D projection.
- If circles remain circles (same shape), angle is preserved.
- If areas change size, area is not preserved.
6. Putting Data into the Map
Different ways to superimpose data on an existing map:
- Dot Maps
- Each data point (event or location) is plotted as a dot at its (latitude, longitude).
- If points overlap heavily, can convert to a Heatmap (showing density).
- Flow Maps
- Lines or arrows to represent movement (e.g., flights between cities).
- Bundling can reduce clutter by merging similar paths into âhighways.â
- Choropleth Maps
- Data is aggregated by predefined areas (e.g., states, counties).
- Common for showing election results per state (because the data is inherently âper stateâ).
- Isopleth (Isoline) Maps
- Areas (and boundary lines) are determined by the data itself (e.g., temperature or rainfall contours).
- Political borders are only background references.
- Corrochromatic Maps (for categorical data)
- Similar concept to isopleth but for categories (e.g., soil types, land cover types).
- Uses distinct color hues or semantic color mappings (forest = green, desert = yellow, etc.).
Multivariate Data on Maps
- Glyphs: e.g., star glyphs or radial glyphs placed at each region.
- Small Multiples: e.g., multiple maps side by side, each focusing on one variable or dimension.
- Embedding the Map in a Chart: e.g., a scatter plot that uses each stateâs shape in place of a standard point/dot.
7. Adjusting the Map to the Data
7.1 Cartograms
- Distort the actual map areas so that each regionâs area becomes proportional to a data variable (e.g., population).
- Contiguous Cartograms: Attempt to retain adjacency and shape, though often heavily distorted.
- Dorling Cartograms: Replace each region with a circle, sized by the data.
- Demers Cartograms: Replace each region with a square, sized by the data.
Example
- The 2012 US Presidential Election map was misleading if viewed in standard geography (lots of red but fewer people).
- A population-proportional cartogram is more âtruthfulâ about how many people are in each region.
7.2 Grid or Unit Cartograms (Mosaic Maps)
- Each region is given one or more cells in a grid, typically representing a fixed quantity (e.g., each cell = one electoral vote).
- Also used in climate reports so small countries are as visually prominent as large ones.
7.3 Metro Maps
- Schematic versions of real subway lines.
- True geographic distance is sacrificed for clarity:
- Straight or smoothly curved lines
- Uniform or near-uniform spacing between stations
- Task: Understand route transfers and count number of stops more easily, not literal geographic distances.
- Popular layouts:
- Hexalinear (lines restricted to 6 directions)
- Octolinear (8 directions)
- Curvilinear or âcircleâ design
7.4 Map Simplifications (Map Schematic Layouts)
- Simplify outlines of regions (e.g., coastline details) to reduce visual complexity.
- Polygon point-removal and smooth arcs yield puzzle-like schematic boundaries.
- Good for broad overviews or when precise boundary shape is unimportant.
8. Combining Space and Time
8.1 Space-Time Cube
- 3D representation:
- Horizontal plane = space (the map)
- Vertical axis = time
- A moving path becomes a âtrajectoryâ that rises over time.
- Often accompanied by a 2D projection of that trajectory onto the map for direct spatial reference.
8.2 Travel-Time (Isochrone) Maps
- Show how long it takes to travel from a specific point to all other locations.
- Isochrones: Contours of equal travel time.
- Example: from central London, color-coded rings reveal equal-travel-time zones.
9. Beyond Geo-Space
- Any 2D âfieldâ can be treated like a map (e.g., basketball courts, indoor spaces, game boards):
- Dot maps, flow maps, heatmaps, etc., all apply if you have (x, y) coordinates.
- Examples:
- Chessboard analysis
- Counter-Strike maps for player positions
- Building evacuation simulations
10. Visualization Critiques
In the second half of the lecture, the professor presented two critique examples.
10.1 Critique Example 1 (A Unit Cartogram with Beer Consumption)
Chart Description
- A unit cartogram (grid-based) combining US states and EU countries as squares of equal size.
- Color-coded by average beer consumption per person per week.
- Some squares labeled with consumption in liters and population.
Major Observations & Critiques
- Color Scale Mismatch
- Data is ordinal or numeric (liters/week), but the color scheme used looks more like a categorical or diverging set.
- A sequential scale would be more intuitive (lowest consumption = light color, highest = dark color).
- Placement of Regions
- Swedenâs square is oddly placed, and large gaps (e.g., Switzerland, Norway) are unmarked.
- This makes it difficult to identify some countries quickly and disrupts the overall topology.
- Comparing Different Drinking Ages / Populations
- US vs. EU countries can be questionable because:
- Different population age structures
- Different legal drinking ages
- Might be misleading if one lumps total population in the denominator.
- US vs. EU countries can be questionable because:
- Trade-Off
- By giving each region an equal-size grid cell (for clarity in labeling), the actual geographic or population size is not represented.
- This is acceptable if the design intention is to give every region an equal visual âvoice,â but the map is no longer geographically âaccurate.â
- Legend & Labels
- The breakpoints (e.g., <1 liter, 1â1.5 liters, 1.5â2 liters, >2 liters) are somewhat arbitrarily chosen and might skip nuance.
- Not colorblind safe (reds, greens, etc. are used).
Positives
- Easy to fit text labels in each region (no tiny, unreadable states).
- Straightforward âby regionâ comparison once you decipher the color legend.
10.2 Critique Example 2 (Spaghetti Line Chart of 2012 Republican Nomination Polls)
Chart Description
- Multiple polling lines (Romney, Gingrich, Santorum, Paul, Perry, Bachmann, Huntsman, Cain, etc.) from June 2011 to January 2012.
- Each poll is plotted as a point in time, connected by lines for each candidate.
- Some lines end abruptly (candidates dropped out of the race).
Major Observations & Critiques
- Distorted Time Axis
- The x-axis spacing corresponds to poll frequency, not calendar dates.
- Months with many polls look stretched; months with few polls look compressed.
- Better approach:
- Plot points on actual calendar intervals and interpolate if necessary.
- Color Issues
- Too many lines, too many colors. The color scheme is not colorblind safe.
- One line is not labeled at all (yellow line for a dropped-out candidate).
- Possible fix: use a colorblind-safe palette (e.g., ColorBrewerâs âSet2â or âDark2â).
- Spaghetti Plot Complexity
- Hard to follow many lines at once.
- Alternative: small multiples or highlight only the still-running candidates in bright colors, others in gray.
- Labeling
- Inconsistent naming or repeated naming. Some lines end with no direct explanation (besides candidate dropout).
- Could add text boxes noting âCandidate X dropped out on date Y.â
- Possible Alternative
- A stacked area chart showing share of total 100%.
- This reveals how each candidateâs fraction changed over time (highlighting shifts more obviously).
Positives
- Shows immediate trends over time (whoâs rising, whoâs falling).
- Lets you see mirrored movements: one candidateâs decline can visually match anotherâs rise.
11. Closing
- The professor thanked everyone for adjusting to shorter breaks.
- Next lecture: Graph Visualization (an important and more specialized topic).
- The professor recommended reviewing map-based references, specialized cartography tools, and color guidelines (e.g., from ColorBrewer).
Additional References
- ColorBrewer for colorblind-safe palettes: https://colorbrewer2.org
- Grid/Unit Cartograms explanation: Cartograms â Introduction and Examples (the professor showed a sample link in slides).
- Michael Gleicher: Paper on juxtaposition vs. superimposition for data comparison.
- âAvoiding Spaghetti Plotsâ articles and blog posts for better multi-line charts.
End of Lecture
- Date: Same day as the lecture (no office hour due to scheduling conflict).
- Note: Next session includes an industry guest speaker and a chance to do further visualization critiques (with chocolate prizes).
This post is licensed under CC BY 4.0 by the author.