Skip to content

Introduction

Learning outcomes

  • Understand the goal of today
  • Read the learning outcomes of today
For teachers

Teaching goals are:

  • Learners have heard to goal of the course
  • Learners have filled in the initial confidence form

Prior question:

  • .

Goal of today

Be able to create your plot from your data.

A typical project visualization

You want to visualize past and predicted population size by country, using the data from the Wikipedia article 'World population'

1. Preparing the data

You (wisely) decide to start with only a subset of the data from the Wikipedia article 'World population':

Country 2000 2015 2030
China 1270 1376 1416
India 1053 1311 1528

From the context, you understand that:

  • the columns with numbers (e.g. 2000) is the years
  • the values in the cells are the estimated population size, in millions

This data is best saved as a comma-separated (.csv) file. If your data is in a spreadsheat (e.g. Calc or Excel), you can typically export your data as a (.csv) file.

2. Reading the data

When the data is saved as a .csv file called introduction_2.csv, you can read this data in R like this:

library(readr)
t <- read_csv("introduction_2.csv")

Now you data is in a table called t.

Reading the data is described in Chapter 7: Data Import.

3. Tidying the data

The data must be transformed to be tidy, which holds these features:

  • Each variable is a column; each column is a variable.
  • Each observation is a row; each row is an observation.
  • Each value is a cell; each cell is a single value.

Our data is not tidy yet:

Country 2000 2015 2030
China 1270 1376 1416
India 1053 1311 1528

Our data is not tidy yet, as we have three observations per row:

  • The population in each in the year 2000
  • The population in each in the year 2015
  • The population in each in the year 2039

In R, we can make this tidy in many ways, for example:

library(tidyr)
t <- t |> pivot_longer(
  cols = c("2000", "2015", "2030")
)

Now the data is tidy like this:

Country name value
China 2000 1270
China 2015 1376
China 2030 1416
India 2000 1053
India 2015 1311
India 2030 1528

Transforming the data is described in:

4. Cleaning the data

We need to clean the data, as plotting the data as such will fail:

library(ggplot2)
ggplot(t, aes(x = name, y = value, color = Country)) + geom_line() # Will fail

We do some data transformations:

names(t) <- c("country", "t", "n")
t$country <- as.factor(t$country)
t$t <- as.numeric(t$t)

Now the data looks like:

country t n
China 2000 1270
China 2015 1376
China 2030 1416
India 2000 1053
India 2015 1311
India 2030 1528

Plotting this now works:

ggplot(t, aes(x = t, y = n, color = country)) + geom_line()

5. Saving the plot

After having plotted the plot, it can be saved as such:

ggsave("my_plot.svg")
ggsave("my_plot.png")

6. Refine

The plot looks like this now:

My plot

You can refine it in many ways:

  • Refine the SVG in another tool
  • Refine the generation of the plot

For example, in R:

ggplot(t, aes(x = t, y = n, color = country)) +
  geom_line() +
  geom_point() +
  scale_x_continuous("Time (year)") +
  scale_y_continuous("Population size (million)", limits = c(0, NA)) +
  labs(title = "Population size in time", color = "Country")

Now the plot has proper labels:

My refined plot

Refining the looks of your plot is described in:

As there is so much to tweak, you definitely need to search the web and/or use an AI.

Exercises

Exercise 1

Create the plot you need for this course. Follow the steps at the 'A typical project visualization' section. Read the chapters if needed and/or search the web and/or use an AI to get what you need.