Welcome back! As a reminder, each lesson is designed as a 5 - 10 minute virtual session conducted for EnCompass staff to expand their skills with data, and the means of learning is the R programming language.
Usefulness of pivoting data
Learning objectives
For this session, the learning objective is to:
Reshape data from wide to long data
As always, make sure the correct packages are active in the session.
# installing packages#install.packages("tidyverse")#install.packages("readxl")#install.packages("writexl")#install.packages("here")#install.packages("writexl")# another option for installing packages#install.packages(c("tidyverse", "readxl", "writexl", "here", "writexl"))library(tidyverse)
ββ Attaching core tidyverse packages ββββββββββββββββββββββββ tidyverse 2.0.0 ββ
β dplyr 1.1.4 β readr 2.1.5
β forcats 1.0.0 β stringr 1.5.1
β ggplot2 3.5.2 β tibble 3.2.1
β lubridate 1.9.4 β tidyr 1.3.1
β purrr 1.0.4
ββ Conflicts ββββββββββββββββββββββββββββββββββββββββββ tidyverse_conflicts() ββ
β dplyr::filter() masks stats::filter()
β dplyr::lag() masks stats::lag()
βΉ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)library(writexl)library(here)
here() starts at C:/Users/brian/OneDrive/Documents/website
What is the point of this?
Good question. When we collect data in the real and virtual worlds it is rarely structured for analysis. A common need is that we have to pivot a dataset from a wide to a long format. A dataset often comes in a wide format with a question in a column followed by a few columns with options for responses where each option is its own column. Hereβs an simple example to illustrate the point.
Weβve got five columns of data for a respondentβs favorite color, and for some reason person F selected 3 colors. Here are a few more observations.
The column with the question just holds the question variable and is blank.
The majority of rows in these columns are also blank.
A single variable - favorite color - is spread out over a bunch of columns.
To simplify this, letβs put these all in a single column.
Reshaping
Weβll create a new object called df1. This maintains our original df object in case we still have a need for the original object. Then, letβs eliminate the 2nd column that is all blanks. Next, pivot the columns with the 1s in a single column, and change the 1s to the name of each color. Finally, we should put pivotted data in a column with a name that is clear and useful for coding β fav_color. Hereβs the code to do this.
df1 <- df |>select(-2) |>#eliminates the 2nd columnpivot_longer(cols =2:5, #use these columns for the transformationnames_to ="fav_color", #Name the column where the the color names are storedvalues_drop_na =TRUE) #remove the rows with missing dataknitr::kable(df1)
Resp
fav_color
value
A
blue
1
B
red
1
C
green
1
D
purple
1
E
green
1
F
blue
1
F
red
1
F
purple
1
G
blue
1
Success! Within the pivot_longer() call, we told it to use columns 2 through 5 since we want to keep the Resp(ondents) column unchanged. We maintained the first column and added additional rows to account for the multiple colors selected by Resp F.
Have fun!
Now itβs your turn practice! Below is a fully functioning code editor with starting code in place. Try pivoting data, and then feel free to make a bar chart using ggplot + geom_bar() (or geom_col()).