Introduction¶

Learning outcomes

Understand the goal of today
Read the learning outcomes of today

For teachers

Teaching goals are:

Learners have heard to goal of the course
Learners have filled in the initial confidence form

Prior question:

.

Goal of today¶

Be able to create your plot from your data.

A typical project visualization¶

You want to visualize past and predicted population size by country, using the data from the Wikipedia article 'World population'

1. Preparing the data¶

You (wisely) decide to start with only a subset of the data from the Wikipedia article 'World population':

Country	2000	2015	2030
China	1270	1376	1416
India	1053	1311	1528

From the context, you understand that:

the columns with numbers (e.g. 2000) is the years
the values in the cells are the estimated population size, in millions

This data is best saved as a comma-separated (.csv) file. If your data is in a spreadsheat (e.g. Calc or Excel), you can typically export your data as a (.csv) file.

2. Reading the data¶

When the data is saved as a .csv file called introduction_2.csv, you can read this data in R like this:

library(readr)
t <- read_csv("introduction_2.csv")

Now you data is in a table called t.

Reading the data is described in Chapter 7: Data Import.

3. Tidying the data¶

The data must be transformed to be tidy, which holds these features:

Each variable is a column; each column is a variable.
Each observation is a row; each row is an observation.
Each value is a cell; each cell is a single value.

Our data is not tidy yet:

Country	2000	2015	2030
China	1270	1376	1416
India	1053	1311	1528

Our data is not tidy yet, as we have three observations per row:

The population in each in the year 2000
The population in each in the year 2015
The population in each in the year 2039

In R, we can make this tidy in many ways, for example:

library(tidyr)
t <- t |> pivot_longer(
  cols = c("2000", "2015", "2030")
)

Now the data is tidy like this:

Country	name	value
China	2000	1270
China	2015	1376
China	2030	1416
India	2000	1053
India	2015	1311
India	2030	1528

Transforming the data is described in:

4. Cleaning the data¶

We need to clean the data, as plotting the data as such will fail:

library(ggplot2)
ggplot(t, aes(x = name, y = value, color = Country)) + geom_line() # Will fail

We do some data transformations:

names(t) <- c("country", "t", "n")
t$country <- as.factor(t$country)
t$t <- as.numeric(t$t)

Now the data looks like:

country	t	n
China	2000	1270
China	2015	1376
China	2030	1416
India	2000	1053
India	2015	1311
India	2030	1528

Plotting this now works:

ggplot(t, aes(x = t, y = n, color = country)) + geom_line()

5. Saving the plot¶

After having plotted the plot, it can be saved as such:

ggsave("my_plot.svg")
ggsave("my_plot.png")

6. Refine¶

The plot looks like this now:

My plot

You can refine it in many ways:

Refine the SVG in another tool
Refine the generation of the plot

For example, in R:

ggplot(t, aes(x = t, y = n, color = country)) +
  geom_line() +
  geom_point() +
  scale_x_continuous("Time (year)") +
  scale_y_continuous("Population size (million)", limits = c(0, NA)) +
  labs(title = "Population size in time", color = "Country")

Now the plot has proper labels:

My refined plot

Refining the looks of your plot is described in:

Chapter 11: Communication

As there is so much to tweak, you definitely need to search the web and/or use an AI.

Exercises¶

Exercise 1¶

Create the plot you need for this course. Follow the steps at the 'A typical project visualization' section. Read the chapters if needed and/or search the web and/or use an AI to get what you need.