Associative arrays¶
Learning objectives
- work with associative arrays
For teachers
Teaching goals are:
- Learners work with associative arrays
- Learners can get the unique elements of a column
- Learners can make a tally of a column
Lesson plan:
- 5 mins: prior knowledge
- 5 mins: presentation
- 15 mins: challenge
- 5 mins: feedback
Overview¶
In this session, we work with associative arrays. Associative arrays are used in, among others, to find the (unique) values in a column and to create a tally.
Exercises¶
See the exercise procedure here.
Exercise 1: confirming things are true¶
Learning objectives
- experience variables
Download the data¶
In a terminal, do:
wget https://raw.githubusercontent.com/richelbilderbeek/awk_course/master/data/diamonds_no_header.tsv
to download a file called diamonds_no_header.tsv
.
Write to and read from an associative array¶
In a terminal, in the same folder as where the data is downloaded, do:
In English, this is: 'At the start, in the counts
array, at element ideal
,
store the value 12345
. In the end, if the counts
array, show the
value at element ideal
.
Confirm that this is true.
Get a count¶
In a terminal, in the same folder as where the data is downloaded, do:
In English, this is: 'For every line, in the counts
array,
at the element in the second column, increase the value by one.
In the end, if the counts
array, show the
value at element ideal
.
Confirm that this is true.
Get a count again¶
In a terminal, in the same folder as where the data is downloaded, do:
In English, this is: 'For every line, in the counts
array,
at the element in the second column, increase the value by one.
In the end, if the counts
array, show the
value at element ideal
.
Confirm that this is true.
Get a count again again¶
In a terminal, in the same folder as where the data is downloaded, do:
In English, this is: 'For every line, in the counts
array,
at the element in the second column, increase the value by one.
In the end, if the counts
array, show the
value at element ideal
.
Confirm that this is true.
Get a count again again again¶
In a terminal, in the same folder as where the data is downloaded, do:
In English, this is: 'For every line, in the counts
array,
at the element in the second column, increase the value by one.
In the end, if the counts
array, show the
value at element ideal
.
Confirm that this is true.
Show all elements¶
In a terminal, in the same folder as where the data is downloaded, do:
In English, this is: 'For every line, in the counts
array,
at the element in the second column, increase the value by one.
In the end, show all elements of the array counts
.
Confirm that this is true.
Show all values¶
In a terminal, in the same folder as where the data is downloaded, do:
In English, this is: 'For every line, in the counts
array,
at the element in the second column, increase the value by one.
In the end, of all elements of the array counts
, show their values.
Confirm that this is true.
Show all elements and values¶
In a terminal, in the same folder as where the data is downloaded, do:
awk '{ counts[$2]++ } END { for (count in counts) print count ": " counts[count] }' diamonds_no_header.tsv
In English, this is: 'For every line, in the counts
array,
at the element in the second column, increase the value by one.
In the end, show all elements of the array counts
and their respective values.
Confirm that this is true.
Exercise 2: explore data¶
Download the data¶
In a terminal, do:
to download a file called mpg.csv.csv
.
This file is a tab-separated file about diamonds and
is part of the ggplot2
R package.
The dataset has the following columns:
Index | Column name | Description |
---|---|---|
1 | manufacturer |
manufacturer name |
2 | model |
model name |
3 | displ |
engine displacement, in litres |
4 | year |
year of manufacture |
5 | cyl |
number of cylinders |
6 | trans |
type of transmission |
7 | drv |
the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd |
8 | cty |
city miles per gallon |
9 | hwy |
highway miles per gallon |
10 | fl |
fuel type |
11 | class |
type of car |
Explore the data¶
Using awk
only:
- show if the data has a header yes/no
- show the number of cars in the dataset
- show the number of columns in the dataset
- show all the car manufacturers' names
- per car manufacturer, show the number of cars it produces
- show the lowest city miles per gallon
- show the highest city miles per gallon, in city miles per liter. Assume a gallon, is a US gallon. A US gallon is 3.785411784 liter
- find the type (i.e. in the last column) of car that spends the least fuel on the highway
- imagine you have each of these cars once. You want each of them to drive one city mile. How much gallons of fuel do you need?
- imagine you have each of these cars once. You want each of them to drive one city mile. How much gallons of each type of fuel do you need?
Think:
- There are multiple way to increase an element in an array by one. Why so many? Is there a difference?