SOS9028 – Data visualization
Course content
This course is for anyone who wants to learn how to produce, refine, and present effective visualizations generated from datasets, summary tables, or the output of statistical models.
The effective use of graphs and charts is an important way to explore data for yourself and to communicate your ideas and results to others.
Being able to produce effective plots from data is also the best way to develop an eye for reading and understanding visualizations made by others, whether presented in academia, business, policy, or the media.
This seminar provides an intensive, hands-on introduction to the principles and practice of data visualization. We will begin with an overview of some basic principles. We will focus not just on the aesthetic aspects of good plots, but on how their effectiveness is rooted in the way we perceive properties like length, absolute and relative size, orientation, shape, and color. Students will learn how to produce and refine plots using ggplot, a powerful, versatile, and widely-used visualization library for R. It implements a "grammar of graphics" that gives us a coherent way to produce visualizations by expressing relationships between the attributes of data and their graphical representation.
Through a series of worked examples and exercises, students will learn how to build plots piece by piece, beginning with summaries of single variables and moving on to more complex graphics. Topics covered include plotting continuous and categorical variables, layering information on graphics; faceting grouped data to produce effective "small multiple" plots; transforming data to easily produce visual summaries on the graph such as trend lines, linear fits, error ranges, and boxplots; creating maps, together with simpler alternatives to maps for country- or state-level data. We will also cover cases where we are not working directly with a dataset but rather with estimates from a statistical model. Using these tools we will then explore the practical process of refining plots to accomplish common tasks such as highlighting key features of the data, labeling particular points, annotating plots, and changing their overall appearance. Finally we will examine some strategies for presenting graphical results in different formats (such as in print, online, or in slides) and to different sorts of audiences.
The course is held by Kieran Healy. Kieran Healy is Associate Professor in Sociology and the Kenan Institute for Ethics at Duke University. His research interests are in economic sociology, the sociology of culture, the sociology of organizations, and social theory. He is the author of Last Best Gifts: Altruism and the Market for Human Blood and Organs. His current focus is on the moral order of market society, the effect of quantification on the emergence and stabilization of social categories, and the link between these two topics.
Learning outcome
At the end of the course, participants will
- Understand the basic principles behind effective data visualization
- Have a practical sense for why some graphs and figures work well
while others may fail to inform or actively mislead
- Know how to create a wide range of plots in R using ggplot2
- Know how to refine plots for effective presentation
Admission
Ph.d.-students at the Department of Sociology and Human Geography register for the course in Studentweb.
Participants outside the Department of Sociology and Human Geography shall fill out this application form.
The application deadline is 16th July 2017!
Prerequisites
Formal prerequisite knowledge
Students should have some basic familiarity with elementary statistical concepts. Some knowledge of R and RStudio will be helpful, but is not required.
Teaching
Room: PC-lab 035, Harriet Holters Building (in the basement)
Program
Wednesday August 16th
9.00-11.30: Session 1. Course Overview and Supervised Lab Time.
Getting oriented to R, RStudio, and RMarkdown.
Make your first graph.
11.30-12.30: Lunch
12.30-14.30: Session 2. Lecture and Discussion.
Reading: Tufte; Cleveland; Ware, Few.
Looking at Data: Good graphs and Bad.
Perception and Data Visualization
Visual Tasks and Decoding Graphs
Problems of Honesty and Good Judgement
14.30-15: Coffee break
15.00-17.00: Session 3. Lecture and Supervised Lab Time
Reading: Healy; Grolemund & Wickham
Core ggplot concepts
Tidy data
Data Mappings and Aesthetics
Geoms and plot types
Thursday August 17th
9.00-11.30: Session 1. Lecture and Supervised Lab Time
Reading: Healy; Grolemund & Wickham
Grouping, Faceting, and Transforming Data
Small Multiples
Data Transformations via Geoms and Pipelines
11.30-12.30: Lunch
12.30-14.30: Session 2. Lecture and Supervised Lab Time
Working with geoms
Writing and drawing on plots
Scales, guides, and themes
14.30-15: Coffee break
15.00-17.00: Session 3. Lecture and Supervised Lab Time
Working with Models
Getting model-based graphics right
Model objects
Generating predictions
Using Broom
Marginal effects
Other tools
Friday August 18th
9.00-11.30: Lecture and Supervised Lab Time
Choropleth Maps
Statebins
Small Multiple Maps
Is your Data Really Spatial?
11.30-12.30: Lunch
12.30-14.30: Session 2. Lecture and Supervised Lab Time
Refining Plots
Color and Color Layering
Working with Themes
14.30-15: Coffee break
15.00-17.00: Session 3. Supervised Lab Time
Case Studies: Redrawing bad graphs
Reading list
- Readings will be supplied in PDF form by the instructor
- Kieran Healy. 2017. Data Visualization for Social Science. (Draft.)
- William S. Cleveland. Visualizing Data. Hobart Press.
- Stephen Few. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.
- Garrett Grolemund and Hadley Wickham. 2016. R for Data Science. Wiley.
- Jeffrey Heer and Michael Bostock. "Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design." Proceedings of the Sigchi Conference on Human Factors in Computing Systems, CHI ’10, New York.
- Edward Tufte. 1983. The Visual Display of Quantitative Information. Graphics Press.
- Colin Ware. 2008. Visual Thinking for Design. Morgan Kaufman.
- Leland Wilkinson. 2005. The Grammar of Graphics. Springer.
Examination
Students will be assessed on (1) Attendance and active participation, (2) Completion of exercises during course time, and (3) A final paper that, using the tools of reproducible research covered in the course, produces effective visualizations from a data set of interest to the student and agreed upon with the instructor in advance of submission.
The paper is to be submitted by 1st October 2017 to katalin.godberg@sosgeo.uio.no.
Grading scale
Grades are awarded on a pass/fail scale. Read more about the grading system.