Introduction¶
Learning outcomes
- Understand the goal of today
- Read the learning outcomes of today
For teachers
Teaching goals are:
- Learners have heard to goal of the course
- Learners have filled in the initial confidence form
Prior question:
- .
Goal of today¶
Be able to create your plot from your data.
A typical project visualization¶
You want to visualize past and predicted population size by country, using the data from the Wikipedia article 'World population'
1. Preparing the data¶
You (wisely) decide to start with only a subset of the data from the Wikipedia article 'World population':
| Country | 2000 | 2015 | 2030 |
|---|---|---|---|
| China | 1270 | 1376 | 1416 |
| India | 1053 | 1311 | 1528 |
From the context, you understand that:
- the columns with numbers (e.g.
2000) is the years - the values in the cells are the estimated population size, in millions
This data is best saved as a comma-separated (.csv) file.
If your data is in a spreadsheat (e.g. Calc or Excel),
you can typically export your data as a (.csv) file.
2. Reading the data¶
When the data is saved as a .csv file called
introduction_2.csv,
you can read this data in R like this:
Now you data is in a table called t.
Reading the data is described in Chapter 7: Data Import.
3. Tidying the data¶
The data must be transformed to be tidy, which holds these features:
- Each variable is a column; each column is a variable.
- Each observation is a row; each row is an observation.
- Each value is a cell; each cell is a single value.
Our data is not tidy yet:
| Country | 2000 | 2015 | 2030 |
|---|---|---|---|
| China | 1270 | 1376 | 1416 |
| India | 1053 | 1311 | 1528 |
Our data is not tidy yet, as we have three observations per row:
- The population in each in the year 2000
- The population in each in the year 2015
- The population in each in the year 2039
In R, we can make this tidy in many ways, for example:
Now the data is tidy like this:
| Country | name | value |
|---|---|---|
| China | 2000 | 1270 |
| China | 2015 | 1376 |
| China | 2030 | 1416 |
| India | 2000 | 1053 |
| India | 2015 | 1311 |
| India | 2030 | 1528 |
Transforming the data is described in:
4. Cleaning the data¶
We need to clean the data, as plotting the data as such will fail:
We do some data transformations:
Now the data looks like:
| country | t | n |
|---|---|---|
| China | 2000 | 1270 |
| China | 2015 | 1376 |
| China | 2030 | 1416 |
| India | 2000 | 1053 |
| India | 2015 | 1311 |
| India | 2030 | 1528 |
Plotting this now works:
5. Saving the plot¶
After having plotted the plot, it can be saved as such:
6. Refine¶
The plot looks like this now:

You can refine it in many ways:
- Refine the SVG in another tool
- Refine the generation of the plot
For example, in R:
ggplot(t, aes(x = t, y = n, color = country)) +
geom_line() +
geom_point() +
scale_x_continuous("Time (year)") +
scale_y_continuous("Population size (million)", limits = c(0, NA)) +
labs(title = "Population size in time", color = "Country")
Now the plot has proper labels:

Refining the looks of your plot is described in:
As there is so much to tweak, you definitely need to search the web and/or use an AI.
Exercises¶
Exercise 1¶
Create the plot you need for this course. Follow the steps at the 'A typical project visualization' section. Read the chapters if needed and/or search the web and/or use an AI to get what you need.