Why am I here?

Welcome back! As a reminder, each lesson is designed as a 5 - 10 minute virtual session conducted for EnCompass staff to expand their skills with data, and the means of learning is the R programming language.

Usefulness of pivoting data

Learning objectives

For this session, the learning objective is to:

Reshape data from wide to long data

As always, make sure the correct packages are active in the session.

# installing packages
#install.packages("tidyverse")
#install.packages("readxl")
#install.packages("writexl")
#install.packages("here")
#install.packages("writexl")

# another option for installing packages
#install.packages(c("tidyverse", "readxl", "writexl", "here", "writexl"))


library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)
library(writexl)
library(here)

here() starts at C:/Users/brian/OneDrive/Documents/website

What is the point of this?

Good question. When we collect data in the real and virtual worlds it is rarely structured for analysis. A common need is that we have to pivot a dataset from a wide to a long format. A dataset often comes in a wide format with a question in a column followed by a few columns with options for responses where each option is its own column. Here’s an simple example to illustrate the point.

df <- read_xlsx(here::here("posts/pivot_longer/colors.xlsx")) 

knitr::kable(df)

Resp	What_is_your_favorite color?	blue	red	green	purple
A	NA	1	NA	NA	NA
B	NA	NA	1	NA	NA
C	NA	NA	NA	1	NA
D	NA	NA	NA	NA	1
E	NA	NA	NA	1	NA
F	NA	1	1	NA	1
G	NA	1	NA	NA	NA

Indeed, this data is Wide

We’ve got five columns of data for a respondent’s favorite color, and for some reason person F selected 3 colors. Here are a few more observations.

The column with the question just holds the question variable and is blank.
The majority of rows in these columns are also blank.
A single variable - favorite color - is spread out over a bunch of columns.

To simplify this, let’s put these all in a single column.

Reshaping

We’ll create a new object called df1. This maintains our original df object in case we still have a need for the original object. Then, let’s eliminate the 2nd column that is all blanks. Next, pivot the columns with the 1s in a single column, and change the 1s to the name of each color. Finally, we should put pivotted data in a column with a name that is clear and useful for coding – fav_color. Here’s the code to do this.

df1 <- df |> 
  select(-2) |>  #eliminates the 2nd column
  pivot_longer(cols = 2:5, #use these columns for the transformation
               names_to = "fav_color", #Name the column where the the color names are stored
               values_drop_na = TRUE) #remove the rows with missing data

knitr::kable(df1)

Resp	fav_color	value
A	blue	1
B	red	1
C	green	1
D	purple	1
E	green	1
F	blue	1
F	red	1
F	purple	1
G	blue	1

Success! Within the pivot_longer() call, we told it to use columns 2 through 5 since we want to keep the Resp(ondents) column unchanged. We maintained the first column and added additional rows to account for the multiple colors selected by Resp F.

Have fun!

Now it’s your turn practice! Below is a fully functioning code editor with starting code in place. Try pivoting data, and then feel free to make a bar chart using ggplot + geom_bar() (or geom_col()).