Welcome to the second iteration of the Humans Learning lessons. As a reminder, each lesson is designed as a 5 - 10 minute virtual session conducted for EnCompass staff to expand their skills with data, and the means of learning is the R programming language. Each lesson will have learning objectives, some example code and explanation to demonstrate a technique or skill, and an open code chunk at the end for you to have some fun. Each lesson is then captured in an html file for online access. This is all in the service of humans learning. Enjoy!
Learning objectives
For this session, the learning objectives are to:
Group data to calculate summary statistics
Make your first line plot
Install and load packages
In your R script, you will use the install.packages() and library() functions to install and load the two packages Tidyverse and Gapminder.
Tidyverse provides a suite of compatible data wrangling and visualization tools. The workhorse of data visualization is the ggplot2 package. With ggplot2 the sky is the limit! From basic bar plots to animated graphics to interactive charts and tables connected by a common data source, ggplot2 and its extension packages can do it all. And once again, Gapminder provides a dataset extracted from the global trend data maintained by, https://www.gapminder.org/.
#Installs the package on your system.install.packages(c("tidyverse", "gapminder"), repos ="https://cloud.r-project.org/")
Installing packages into 'C:/Users/brian/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'tidyverse' successfully unpacked and MD5 sums checked
package 'gapminder' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\brian\AppData\Local\Temp\RtmpuAyS0W\downloaded_packages
#load the libraries so you can use themlibrary(tidyverse)
ββ Conflicts ββββββββββββββββββββββββββββββββββββββββββ tidyverse_conflicts() ββ
β dplyr::filter() masks stats::filter()
β dplyr::lag() masks stats::lag()
βΉ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)
View the data
Just as in the first session, the first step is to view the data. To look at just the first six rows so you can see the variable names and structure of the data pass gapminder to head() as in the code below.
# assign gapminder to df # this is required, but it makes life easier# don't we all want life to be easierdf <- gapminder# look at the gapminder datasethead(df)
# A tibble: 6 Γ 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
tail(df)
# A tibble: 6 Γ 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Zimbabwe Africa 1982 60.4 7636524 789.
2 Zimbabwe Africa 1987 62.4 9216418 706.
3 Zimbabwe Africa 1992 60.4 10704340 693.
4 Zimbabwe Africa 1997 46.8 11404948 792.
5 Zimbabwe Africa 2002 40.0 11926563 672.
6 Zimbabwe Africa 2007 43.5 12311143 470.
Make a line plot
As fun as looking at data is, we probably want to do more. Since our objective is to make a line plot, we should do that next. The structure of ggplot requires that we pass it an object (df), the type of geom_* we want to make (in this case a line plot), and the aesthetics or the variables we want to plot.
#make a plotggplot(data = df) +geom_line(aes(x = year, y = gdpPercap, color = continent))
Yeeeeesh!! That did not work at all. Looks like we forgot to group_by() and summarize() to create summary statistics that would provide us a single datapoint to follow across the years.
Letβs us df to create a new object, df1. df1 will group the data by continent and year and create a variable for the mean GDP per capita. This will provide a single data point for each continent, for each year so that we can make useful lines for the mean GDP per capita over time.
`summarise()` has grouped output by 'continent'. You can override using the
`.groups` argument.
Now we just have to remember the how ggplot works and make our line plot. It takes three components.
The data object, df1
The geom, geom_line()
The aesthetics, year (x-axis), avg_gdpPercap (y-axis), and continent (color).
#| class-output: preggplot(df1) +geom_line(aes(x = year, y = avg_gdpPercap, color = continent))
Have fun!
Now itβs your turn practice! Below is a fully functioning code editor with starting code in place. Feel free to experiment with different grouping variables in the group_by() call or to adjust the summary statistic in summarize(). Then, have fun with the plot!