6: Start with your goal: more data wrangling

BSTA 526: R Programming for Health Data Science

Author

Affiliation

Meike Niederhausen, PhD & Jessica Minnier, PhD

OHSU-PSU School of Public Health

Published

February 12, 2026

Modified

February 12, 2026

1 Welcome to R Programming: Part 6!

Today we will practice wrangling messy data to prepare them for analysis.

Before you get started:

Remember to save this notebook under a new name, such as part_06_b526_YOURNAME.qmd.

Load the packages in the setup code chunk

1.1 Learning Objectives

Practice thinking through and planning ahead data wrangling steps to prepare a dataset ready for analysis
Practice working with real data
Practice cleaning variables
Practice merging datasets with bind_rows and joining
Practice making data long by pivoting
Practice making complex visualizations with ggplot

2 “Real data”

Today’s goal: wrangle messy data into something we can work with.

2.1 Mouse data

Today we are going to be working with a subset of real data that Dr. Minnier has analyzed.
- The actual dataset had hundreds of columns (dozens of outcome variables, a few biomarker values, hundreds of miRNA expression data, and lipidomics on a number of lipids).
- The data we are using are a subset of the actual data with some random noise added to the values to protect the data privacy of the lab.
Mouse data variables:
- treatment (yes, no)
- sex (male, female)
- strain (genetic strain of mouse: Balb/C, C3H)
- time point (1 month, 6 month, 12 month = age of mice)
- two miRNA expression values (micro ribonucleic acid, or microRNA)
- biomarker values from three different brain tissues
- outcomes of interest
  - learning outcome (higher values are better)
  - preference for object 1 and object 2 (from behavioral and cognitive tests on mice) which is measured in % time spent with that object.
The format of the Excel sheet is pretty similar to what was originally given, other than removing some columns.
- One big change is that in the original data, each time point was a different cohort of mice, but we are pretending that it is the same cohort of mice and we have longitudinal data (actually impossible due to the way the biomarkers are measured, but let’s just ignore that). This will allow you to practice more complex joins.

2.2 Challenge 1 - group work!

The data are in mouse_biomarker.xlsx (4 different sheets (tabs) with data), which is in the data folder. Open the data and look at the different sheets.

Talk about what challenges to importing the data you anticipate, just by looking at them in Excel. You are not yet loading the data in this step.
Look at the plot below. Based on the data in the Excel file, discuss what kind of data wrangling steps you would need to take to make this plot. These can be somewhat vague ideas about what kind of columns you’d need in your wrangled dataset, not necessarily specific steps. It might help to write out pseudo ggplot code, with elements (i.e. x axis, y axis, fill, facet) mapped to column names. You still haven’t loaded the data in this step!
Try to read in the four sheets into R, saving them as data frames called mouse_demo, mouse_tp1, mouse_tp2, and mouse_tp3, respectively. Use glimpse() to look at how the data were read in and determine whether you were successful in reading in the data how you intended to.
How would you combine all these data into one dataset? Write out the steps, including any data cleaning/wrangling steps you need to do.

3 Read in the data

3.1 Read in data files

Read in each Excel sheet separately
Use .name_repair argument to clean the names somewhat and standardize the format
Include “/” as a missing value
Remove empty rows and columns (janitor::remove_empty)
Check column types and names

# .name_repair = make_clean_names is the same as applying clean_names() after reading in the data
# Note this is the same code for each, just the sheet number is changing

# sheet 1
mouse_demo <- read_excel(here::here("part6", "data","mouse_biomarker.xlsx"), 
                         sheet = 1,
                         .name_repair = janitor::make_clean_names) %>%
  remove_empty(which = c("rows","cols"))

# sheet 2
mouse_tp1 <-  read_excel(here::here("part6", "data","mouse_biomarker.xlsx"), 
                         sheet = 2, 
                         na = c("","/"),
                         n_max = 34,
                         .name_repair = janitor::make_clean_names) %>%
  remove_empty(which = c("rows","cols"))

# sheet 3
mouse_tp2 <-  read_excel(here::here("part6", "data","mouse_biomarker.xlsx"), 
                         sheet = 3, 
                         na = c("","/"),
                         .name_repair = janitor::make_clean_names) %>%
  remove_empty(which = c("rows","cols"))

# sheet 4
mouse_tp3 <-  read_excel(here::here("part6", "data","mouse_biomarker.xlsx"), 
                         sheet = 4, 
                         na = c("","/"),
                         .name_repair = janitor::make_clean_names) %>%
  remove_empty(which = c("rows","cols"))

glimpse(mouse_demo)

Rows: 32
Columns: 4
$ sid    <dbl> 137, 138, 139, 140, 33, 34, 35, 36, 180, 181, 182, 183, 192, 19…
$ strain <chr> "C3H", "C3H", "C3H", "C3H", "C3H", "C3H", "C3H", "C3H", "Balb/C…
$ trt    <chr> "-", "-", "-", "-", "+", "+", "+", "+", "-", "-", "-", "-", "+"…
$ sex    <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"…

glimpse(mouse_tp1)

Rows: 33
Columns: 13
$ sid                                <dbl> NA, 137, 138, 139, 140, 33, 34, 35,…
$ normalized_bdnf_amygdala_pg_mg     <dbl> NA, 492.4831, 453.6635, 971.8741, 5…
$ normalized_bdnf_cortex_pg_mg       <dbl> NA, 720.0173, 884.5668, 1148.2862, …
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, NA, 1215.8147, 638.2747, 979.14…
$ normalized_cd68_amygdala_pg_mg     <dbl> NA, 988.9628, 775.5970, 2045.3141, …
$ normalized_cd68_cortex_pg_mg       <dbl> NA, 8.393707, 7.901366, 12.779926, …
$ normalized_cd68_hypothalamus_pg_mg <dbl> NA, NA, 4373.811, 2951.599, 3267.76…
$ normalized_map2_cortex_pg_mg       <dbl> NA, 352.9653, 1007.4147, 1739.4782,…
$ mirna_1                            <dbl> NA, 5.26302, 6.78336, 7.88867, 7.28…
$ mi_rna_2                           <dbl> NA, 1.6536200, -0.2794240, -0.63788…
$ learning_outcome                   <dbl> NA, 3.52, 1.56, 0.00, 7.33, 26.37, …
$ preference                         <chr> "Obj 1", "41.722049918771226", "74.…
$ x_4                                <chr> "Obj 2", "58.277950081228767", "25.…

glimpse(mouse_tp2)

Rows: 33
Columns: 10
$ sid                                <dbl> NA, 137, 138, 139, 140, 33, 34, 35,…
$ normalized_bdnf_amygdala_pg_mg     <dbl> NA, 275.1623, 491.7336, 365.0235, 4…
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, 1169.285, 1078.640, 1001.431, 1…
$ normalized_cd68_amygdala_ng_mg     <dbl> NA, 574.0655, 407.5826, 504.6470, 4…
$ normalized_cd68_hypothalamus_ng_mg <dbl> NA, 6800.870, 4461.628, 4837.662, 3…
$ mirna_1                            <dbl> NA, -0.0491371, -1.5248200, -1.0322…
$ mi_rna_2                           <dbl> NA, -0.0773419, 0.8424480, -0.55900…
$ learning_outcome                   <dbl> NA, 19.810, 14.480, 0.000, 0.000, 1…
$ preference                         <chr> "Obj 1", "37.513873473917869", "65.…
$ x_4                                <chr> "Obj 2", "62.486126526082131", "34.…

glimpse(mouse_tp3)

Rows: 33
Columns: 9
$ sid                          <dbl> NA, 137, 138, 139, 140, 33, 34, 35, 36, 1…
$ normalized_bdnf_cortex_pg_mg <dbl> NA, 871.8286, 793.4333, 712.3803, 1485.81…
$ normalized_cd68_cortex_pg_mg <dbl> NA, NA, 8.873104, 10.826861, 6.832956, 20…
$ normalized_map2_cortex_pg_mg <dbl> NA, 2693.9386, 644.9745, 1677.7607, 7735.…
$ mirna_1                      <dbl> NA, -0.7367310, -0.2596560, 1.2156400, 0.…
$ mi_rna_2                     <dbl> NA, 0.147994, -0.225338, 1.391810, 0.9031…
$ learning_outcome             <dbl> NA, 2.44, 1.11, 73.22, 0.00, 7.70, 3.67, …
$ preference                   <chr> "Obj 1", "55.967682702901214", "69.006659…
$ x_4                          <chr> "Obj 2", "44.032317297098793", "30.993340…

Things I notice:

Column names “preference” and “x_4” aren’t good/accurate names
They are read in as characters because of the second header row “Obj 1” and “Obj 2”
We need to remove that row, rename the columns, make those columns numeric
We could do this to each individual dataset, or after combining them.
Let’s combine first

3.2 Challenge 2 - group work!

First, discuss the steps needed to combine the time point data.
- Which datasets will you combine first, and how (which R function(s))?
Combine the data
- Combine the 3 time point datasets into one data frame (ignore demographic data for now) called mouse_tp.
- Inspect the resulting data.
How would you create a table of the mouse id (sid) and time (i.e count how many data points we have for each mouse)?

4 Combining time point data

4.1 Stacking time point data with `bind_rows`

We can bind the three time points together
- because while we have the same mice ids, we want to make this long format where the time point observations are in rows.
Therefore, just stacking them on top of each other gives us the full set of time point data that we want to study.

4.2 try 1:

We learned to bind two datasets at a time, but we can bind as many as we want:

mouse_tp <- bind_rows(mouse_tp1, mouse_tp2, mouse_tp3)

glimpse(mouse_tp)

Rows: 99
Columns: 15
$ sid                                <dbl> NA, 137, 138, 139, 140, 33, 34, 35,…
$ normalized_bdnf_amygdala_pg_mg     <dbl> NA, 492.4831, 453.6635, 971.8741, 5…
$ normalized_bdnf_cortex_pg_mg       <dbl> NA, 720.0173, 884.5668, 1148.2862, …
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, NA, 1215.8147, 638.2747, 979.14…
$ normalized_cd68_amygdala_pg_mg     <dbl> NA, 988.9628, 775.5970, 2045.3141, …
$ normalized_cd68_cortex_pg_mg       <dbl> NA, 8.393707, 7.901366, 12.779926, …
$ normalized_cd68_hypothalamus_pg_mg <dbl> NA, NA, 4373.811, 2951.599, 3267.76…
$ normalized_map2_cortex_pg_mg       <dbl> NA, 352.9653, 1007.4147, 1739.4782,…
$ mirna_1                            <dbl> NA, 5.26302, 6.78336, 7.88867, 7.28…
$ mi_rna_2                           <dbl> NA, 1.6536200, -0.2794240, -0.63788…
$ learning_outcome                   <dbl> NA, 3.52, 1.56, 0.00, 7.33, 26.37, …
$ preference                         <chr> "Obj 1", "41.722049918771226", "74.…
$ x_4                                <chr> "Obj 2", "58.277950081228767", "25.…
$ normalized_cd68_amygdala_ng_mg     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ normalized_cd68_hypothalamus_ng_mg <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…

4.3 try 2: Binding difficulties

Try 1 problem: we don’t know which dataset came from which time point.
- We need a time variable!
bind_rows has the option to give each dataset a “name”
- and then tell bind rows to use that information to create a column with that name/label in it.
In this case, we want the time points to be in a column called time (put this in the argument .id =).

The syntax is like this:

df_all <- bind_rows("name1" = df1,
                    "name2" = df2,
                    "name3" = df3,
                    .id = "columnname")

So for our data, we use the time point to distinguish where each dataset came from:

# use the name of the dataset to create an id variable time
mouse_tp <- bind_rows("tp1" = mouse_tp1, 
                      "tp2" = mouse_tp2,
                      "tp3" = mouse_tp3,
                      .id = "time")
mouse_tp %>%
  glimpse()

Rows: 99
Columns: 16
$ time                               <chr> "tp1", "tp1", "tp1", "tp1", "tp1", …
$ sid                                <dbl> NA, 137, 138, 139, 140, 33, 34, 35,…
$ normalized_bdnf_amygdala_pg_mg     <dbl> NA, 492.4831, 453.6635, 971.8741, 5…
$ normalized_bdnf_cortex_pg_mg       <dbl> NA, 720.0173, 884.5668, 1148.2862, …
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, NA, 1215.8147, 638.2747, 979.14…
$ normalized_cd68_amygdala_pg_mg     <dbl> NA, 988.9628, 775.5970, 2045.3141, …
$ normalized_cd68_cortex_pg_mg       <dbl> NA, 8.393707, 7.901366, 12.779926, …
$ normalized_cd68_hypothalamus_pg_mg <dbl> NA, NA, 4373.811, 2951.599, 3267.76…
$ normalized_map2_cortex_pg_mg       <dbl> NA, 352.9653, 1007.4147, 1739.4782,…
$ mirna_1                            <dbl> NA, 5.26302, 6.78336, 7.88867, 7.28…
$ mi_rna_2                           <dbl> NA, 1.6536200, -0.2794240, -0.63788…
$ learning_outcome                   <dbl> NA, 3.52, 1.56, 0.00, 7.33, 26.37, …
$ preference                         <chr> "Obj 1", "41.722049918771226", "74.…
$ x_4                                <chr> "Obj 2", "58.277950081228767", "25.…
$ normalized_cd68_amygdala_ng_mg     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ normalized_cd68_hypothalamus_ng_mg <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…

# what is this telling us?
mouse_tp %>% 
  tabyl(sid, time) %>% 
  adorn_title()

      time        
  sid  tp1 tp2 tp3
   33    1   1   1
   34    1   1   1
   35    1   1   1
   36    1   1   1
  137    1   1   1
  138    1   1   1
  139    1   1   1
  140    1   1   1
  156    1   1   1
  157    1   1   1
  158    1   1   1
  159    1   1   1
  168    1   1   1
  169    1   1   1
  170    1   1   1
  171    1   1   1
  180    1   1   1
  181    1   1   1
  182    1   1   1
  183    1   1   1
  192    1   1   1
  193    1   1   1
  194    1   1   1
  195    1   1   1
  204    1   1   1
  205    1   1   1
  206    1   1   1
  207    1   1   1
  216    1   1   1
  217    1   1   1
  218    1   1   1
  219    1   1   1
 <NA>    1   1   1

4.4 try 3: ng vs pg

The next issue is that binding these three together doesn’t work as expected since we can see tp2 has ng/mg instead of pg/mg.
- However, it looks like this is a typo because they are on the same scale, not off by a factor of 1000 as we might expect based on the names.

mouse_tp %>% 
  select(contains("cd68_amygdala"), contains("cd68_hypo")) %>%
  get_summary_stats() %>% 
  gt()

variable	n	min	max	median	q1	q3	iqr	mad	mean	sd	se	ci
normalized_cd68_amygdala_pg_mg	31	420.775	2045.314	692.514	561.204	795.540	234.335	189.923	726.866	288.833	51.876	105.945
normalized_cd68_amygdala_ng_mg	31	346.568	1087.435	539.723	463.168	623.713	160.545	115.474	566.973	152.604	27.408	55.976
normalized_cd68_hypothalamus_pg_mg	28	2317.989	6898.278	3872.875	3245.352	4867.972	1622.620	1155.090	4108.298	1201.756	227.110	465.992
normalized_cd68_hypothalamus_ng_mg	32	3124.769	11867.444	5147.672	4353.650	6581.292	2227.642	1505.664	5599.085	1805.305	319.136	650.882

Therefore, we should rename the columns of the tp2 data before we bind.
We can use rename() as we have learned to rename those two columns individually,
- but there is also a function rename_with() that can take a function to apply, similar to the way we use functions in across():

colnames(mouse_tp2)

 [1] "sid"                                "normalized_bdnf_amygdala_pg_mg"    
 [3] "normalized_bdnf_hypothalamus_pg_mg" "normalized_cd68_amygdala_ng_mg"    
 [5] "normalized_cd68_hypothalamus_ng_mg" "mirna_1"                           
 [7] "mi_rna_2"                           "learning_outcome"                  
 [9] "preference"                         "x_4"

# repalce _ng with _pg in the column names
# note I tried replacing just "ng" with "pg", but then learning became learnipg!

mouse_tp2 <- mouse_tp2 %>%
  rename_with(.fn = ~str_replace(.x, "_ng", "_pg"))

colnames(mouse_tp2)

 [1] "sid"                                "normalized_bdnf_amygdala_pg_mg"    
 [3] "normalized_bdnf_hypothalamus_pg_mg" "normalized_cd68_amygdala_pg_mg"    
 [5] "normalized_cd68_hypothalamus_pg_mg" "mirna_1"                           
 [7] "mi_rna_2"                           "learning_outcome"                  
 [9] "preference"                         "x_4"

Now we should be able to bind the three datasets together:

mouse_tp <- bind_rows("tp1" = mouse_tp1, 
                      "tp2" = mouse_tp2,
                      "tp3" = mouse_tp3,
                      .id = "time") %>%
  # also rename the mirna columns for consistency
  rename(mirna1 = mirna_1,
         mirna2 = mi_rna_2)

mouse_tp %>%
  glimpse()

Rows: 99
Columns: 14
$ time                               <chr> "tp1", "tp1", "tp1", "tp1", "tp1", …
$ sid                                <dbl> NA, 137, 138, 139, 140, 33, 34, 35,…
$ normalized_bdnf_amygdala_pg_mg     <dbl> NA, 492.4831, 453.6635, 971.8741, 5…
$ normalized_bdnf_cortex_pg_mg       <dbl> NA, 720.0173, 884.5668, 1148.2862, …
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, NA, 1215.8147, 638.2747, 979.14…
$ normalized_cd68_amygdala_pg_mg     <dbl> NA, 988.9628, 775.5970, 2045.3141, …
$ normalized_cd68_cortex_pg_mg       <dbl> NA, 8.393707, 7.901366, 12.779926, …
$ normalized_cd68_hypothalamus_pg_mg <dbl> NA, NA, 4373.811, 2951.599, 3267.76…
$ normalized_map2_cortex_pg_mg       <dbl> NA, 352.9653, 1007.4147, 1739.4782,…
$ mirna1                             <dbl> NA, 5.26302, 6.78336, 7.88867, 7.28…
$ mirna2                             <dbl> NA, 1.6536200, -0.2794240, -0.63788…
$ learning_outcome                   <dbl> NA, 3.52, 1.56, 0.00, 7.33, 26.37, …
$ preference                         <chr> "Obj 1", "41.722049918771226", "74.…
$ x_4                                <chr> "Obj 2", "58.277950081228767", "25.…

4.5 Remove second preference header row

Now let’s deal with the preference column issues

Data wrangling steps:
- Rename the two columns preference and x_4
- Set “Obj 1” and “Obj 2” to NA (use na_if())
- Convert those columns to numeric

mouse_tp <- mouse_tp %>%
  # rename these two columns to make sense
  rename(preference_obj1 = preference,
         preference_obj2 = x_4)

colnames(mouse_tp)

 [1] "time"                               "sid"                               
 [3] "normalized_bdnf_amygdala_pg_mg"     "normalized_bdnf_cortex_pg_mg"      
 [5] "normalized_bdnf_hypothalamus_pg_mg" "normalized_cd68_amygdala_pg_mg"    
 [7] "normalized_cd68_cortex_pg_mg"       "normalized_cd68_hypothalamus_pg_mg"
 [9] "normalized_map2_cortex_pg_mg"       "mirna1"                            
[11] "mirna2"                             "learning_outcome"                  
[13] "preference_obj1"                    "preference_obj2"

mouse_tp <- mouse_tp %>%
  mutate(
    # set Obj 1 and 2 to NA, 
    preference_obj1 = na_if(preference_obj1, "Obj 1"),
    preference_obj2 = na_if(preference_obj2, "Obj 2"),
    # then convert to numeric
    preference_obj1 = as.numeric(preference_obj1),
    preference_obj2 = as.numeric(preference_obj2)
  )

glimpse(mouse_tp)

Rows: 99
Columns: 14
$ time                               <chr> "tp1", "tp1", "tp1", "tp1", "tp1", …
$ sid                                <dbl> NA, 137, 138, 139, 140, 33, 34, 35,…
$ normalized_bdnf_amygdala_pg_mg     <dbl> NA, 492.4831, 453.6635, 971.8741, 5…
$ normalized_bdnf_cortex_pg_mg       <dbl> NA, 720.0173, 884.5668, 1148.2862, …
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, NA, 1215.8147, 638.2747, 979.14…
$ normalized_cd68_amygdala_pg_mg     <dbl> NA, 988.9628, 775.5970, 2045.3141, …
$ normalized_cd68_cortex_pg_mg       <dbl> NA, 8.393707, 7.901366, 12.779926, …
$ normalized_cd68_hypothalamus_pg_mg <dbl> NA, NA, 4373.811, 2951.599, 3267.76…
$ normalized_map2_cortex_pg_mg       <dbl> NA, 352.9653, 1007.4147, 1739.4782,…
$ mirna1                             <dbl> NA, 5.26302, 6.78336, 7.88867, 7.28…
$ mirna2                             <dbl> NA, 1.6536200, -0.2794240, -0.63788…
$ learning_outcome                   <dbl> NA, 3.52, 1.56, 0.00, 7.33, 26.37, …
$ preference_obj1                    <dbl> NA, 41.72205, 74.11972, 52.90954, 6…
$ preference_obj2                    <dbl> NA, 58.2779501, 25.8802817, 47.0904…

# Note the NA row at the bottom!
mouse_tp %>% tabyl(sid, time)

 sid tp1 tp2 tp3
  33   1   1   1
  34   1   1   1
  35   1   1   1
  36   1   1   1
 137   1   1   1
 138   1   1   1
 139   1   1   1
 140   1   1   1
 156   1   1   1
 157   1   1   1
 158   1   1   1
 159   1   1   1
 168   1   1   1
 169   1   1   1
 170   1   1   1
 171   1   1   1
 180   1   1   1
 181   1   1   1
 182   1   1   1
 183   1   1   1
 192   1   1   1
 193   1   1   1
 194   1   1   1
 195   1   1   1
 204   1   1   1
 205   1   1   1
 206   1   1   1
 207   1   1   1
 216   1   1   1
 217   1   1   1
 218   1   1   1
 219   1   1   1
  NA   1   1   1

Those rows that used to contain “Obj 1” and “Obj 2” are now almost empty:

# when sid is NA,  the rest of the data is NA
# look at those rows:
mouse_tp %>% filter(is.na(sid))

# A tibble: 3 × 14
  time    sid normalized_bdnf_amygdala_pg_mg normalized_bdnf_cortex_pg_mg
  <chr> <dbl>                          <dbl>                        <dbl>
1 tp1      NA                             NA                           NA
2 tp2      NA                             NA                           NA
3 tp3      NA                             NA                           NA
# ℹ 10 more variables: normalized_bdnf_hypothalamus_pg_mg <dbl>,
#   normalized_cd68_amygdala_pg_mg <dbl>, normalized_cd68_cortex_pg_mg <dbl>,
#   normalized_cd68_hypothalamus_pg_mg <dbl>,
#   normalized_map2_cortex_pg_mg <dbl>, mirna1 <dbl>, mirna2 <dbl>,
#   learning_outcome <dbl>, preference_obj1 <dbl>, preference_obj2 <dbl>

4.6 `drop_na()`

We want to remove the three rows that have missing data in almost all columns,
- which is the same as removing rows with any missing data in sid (rows where sid = NA).

See Part5 notes and the drop_na() reference for examples.

# number of rows in complete data
mouse_tp %>% nrow

[1] 99

# use tidyr::drop_na() to remove those
# Note: remove_empty() from the `janitor` package won't remove them 
#  because `time` is not empty

# this removes rows were there is *any* missing data in *any* column
# complete cases only, we do not want this
mouse_tp %>% drop_na() %>% nrow

[1] 25

# this removes rows where there is missing data in sid
mouse_tp %>% drop_na(sid) %>% nrow

[1] 96

# save our work, remove just missing sid values
mouse_tp <- mouse_tp %>% 
  drop_na(sid)

Now we have exactly one time point for each sid:

mouse_tp %>%
  tabyl(sid, time)

 sid tp1 tp2 tp3
  33   1   1   1
  34   1   1   1
  35   1   1   1
  36   1   1   1
 137   1   1   1
 138   1   1   1
 139   1   1   1
 140   1   1   1
 156   1   1   1
 157   1   1   1
 158   1   1   1
 159   1   1   1
 168   1   1   1
 169   1   1   1
 170   1   1   1
 171   1   1   1
 180   1   1   1
 181   1   1   1
 182   1   1   1
 183   1   1   1
 192   1   1   1
 193   1   1   1
 194   1   1   1
 195   1   1   1
 204   1   1   1
 205   1   1   1
 206   1   1   1
 207   1   1   1
 216   1   1   1
 217   1   1   1
 218   1   1   1
 219   1   1   1

4.7 Challenge 3 - group work!

Discuss what type of join would you use to combine the demographic data with the time point data and why.
- Discuss how the joining of these two datasets is different than combining the three time point datasets we did above.
After discussing question 1, combine the demographic data with the time point data.
Create a column time_month that is a factor variable with levels “1 month”, “6 months”, and “12 months”, in this order.
Revisit the plot below. What data wrangling steps do we still need to do to create this plot?

# 2. combine the data

# 3.  create time_month factor variable

5 `join`: Combine demographic data with longitudinal time point data

5.1 What type of `join`?

The demographic data mouse_demo provides the independent variables and experiment factors that are important for a future analysis.
Inspect the data to determine what type of join to use

Do we have “demographic” variables for all the id’s in the time point biomarker data?

mouse_demo$sid

 [1] 137 138 139 140  33  34  35  36 180 181 182 183 192 193 194 195 156 157 158
[20] 159 168 169 170 171 204 205 206 207 216 217 218 219

mouse_tp$sid

 [1] 137 138 139 140  33  34  35  36 180 181 182 183 192 193 194 195 156 157 158
[20] 159 168 169 170 171 204 205 206 207 216 217 218 219 137 138 139 140  33  34
[39]  35  36 180 181 182 183 192 193 194 195 156 157 158 159 168 169 170 171 204
[58] 205 206 207 216 217 218 219 137 138 139 140  33  34  35  36 180 181 182 183
[77] 192 193 194 195 156 157 158 159 168 169 170 171 204 205 206 207 216 217 218
[96] 219

How many unique id’s are there in each dataset?

n_distinct(mouse_demo$sid)

[1] 32

n_distinct(mouse_tp$sid)

[1] 32

Both datasets have 32 unique id’s, but are they the same 32 id’s?
- We can check this with code!

Do we have all the mouse demographic ids in our mouse biomarker (time point) data?

# is each mouse_demo id in mouse_tp id?
mouse_demo$sid %in% mouse_tp$sid

 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE

sum(mouse_demo$sid %in% mouse_tp$sid)

[1] 32

Do we have all the time point ids in our mouse demographic data?

# is each mouse_tp id in mouse_demo id?
mouse_tp$sid %in% mouse_demo$sid

 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[91] TRUE TRUE TRUE TRUE TRUE TRUE

tabyl(mouse_tp$sid %in% mouse_demo$sid)

 mouse_tp$sid %in% mouse_demo$sid  n percent
                             TRUE 96       1

We do, they are just repeated for each time point!

In this case, left_join and full_join and inner_join will all give the same result because we have the same ids in both datasets.

5.2 Do the `join`

dim(mouse_demo)

[1] 32  4

dim(mouse_tp)

[1] 96 14

# joins by sid
mouse_data <- full_join(mouse_demo,
                        mouse_tp)

glimpse(mouse_data)

Rows: 96
Columns: 17
$ sid                                <dbl> 137, 137, 137, 138, 138, 138, 139, …
$ strain                             <chr> "C3H", "C3H", "C3H", "C3H", "C3H", …
$ trt                                <chr> "-", "-", "-", "-", "-", "-", "-", …
$ sex                                <chr> "M", "M", "M", "M", "M", "M", "M", …
$ time                               <chr> "tp1", "tp2", "tp3", "tp1", "tp2", …
$ normalized_bdnf_amygdala_pg_mg     <dbl> 492.4831, 275.1623, NA, 453.6635, 4…
$ normalized_bdnf_cortex_pg_mg       <dbl> 720.0173, NA, 871.8286, 884.5668, N…
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, 1169.2845, NA, 1215.8147, 1078.…
$ normalized_cd68_amygdala_pg_mg     <dbl> 988.9628, 574.0655, NA, 775.5970, 4…
$ normalized_cd68_cortex_pg_mg       <dbl> 8.393707, NA, NA, 7.901366, NA, 8.8…
$ normalized_cd68_hypothalamus_pg_mg <dbl> NA, 6800.870, NA, 4373.811, 4461.62…
$ normalized_map2_cortex_pg_mg       <dbl> 352.9653, NA, 2693.9386, 1007.4147,…
$ mirna1                             <dbl> 5.2630200, -0.0491371, -0.7367310, …
$ mirna2                             <dbl> 1.6536200, -0.0773419, 0.1479940, -…
$ learning_outcome                   <dbl> 3.52, 19.81, 2.44, 1.56, 14.48, 1.1…
$ preference_obj1                    <dbl> 41.72205, 37.51387, 55.96768, 74.11…
$ preference_obj2                    <dbl> 58.27795, 62.48613, 44.03232, 25.88…

# View(mouse_data)

6 Some more data wrangling

6.1 Create time month factor

Create a time_month variable that has the factor levels we want to recreate the figure:

# create time_month factor
mouse_data <- mouse_data %>%
  mutate(time_month = case_when(
    time=="tp1" ~ "1 month",
    time=="tp2" ~ "6 months",
    time=="tp3" ~ "12 months"
  ),
  time_month = factor(
    time_month,
    levels = c("1 month", "6 months", "12 months")))

# check!!
mouse_data %>% tabyl(time_month, time) %>% 
  adorn_title()

            time        
 time_month  tp1 tp2 tp3
    1 month   32   0   0
   6 months    0  32   0
  12 months    0   0  32

6.2 Object preference variables cleaning

The two columns for object preference should add to 100%.
- However, there are some rows where the sum is 0%,
- that is a red flag— actually both should be missing values.
How would you correct this and set these values to missing?

mouse_data$preference_obj1 + mouse_data$preference_obj2

 [1] 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
[20] 100 100 100 100 100   0 100 100   0 100 100 100 100 100 100 100 100 100 100
[39] 100 100 100 100 100  NA 100 100  NA 100 100 100 100 100 100 100 100 100 100
[58] 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
[77] 100 100 100   0   0 100 100 100 100   0 100 100 100 100 100   0 100 100 100
[96] 100

There are of course many ways to do this, but one way is:
- Create a new column that is the sum of the two preference columns
- When that sum is 0, set the two preference columns to NA

# First create the column
mouse_data <- mouse_data %>%
  mutate(
    total_pref = preference_obj1 + preference_obj2
    ) 

# look at the rows where the sum is 0
mouse_data %>% 
  filter(total_pref==0) %>% 
  select(sid, time, contains("pref"))

# A tibble: 6 × 5
    sid time  preference_obj1 preference_obj2 total_pref
  <dbl> <chr>           <dbl>           <dbl>      <dbl>
1   180 tp1                 0               0          0
2   181 tp1                 0               0          0
3   206 tp2                 0               0          0
4   206 tp3                 0               0          0
5   216 tp2                 0               0          0
6   218 tp2                 0               0          0

Below we use case_when() to mutate both columns, set to NA if total_pref == 0 and otherwise (.default) use the value of that column
We show two ways to do this, either:
- one column at a time
- or, across columns that start with “pref”

Mutate columns individually

# we can mutate one column at a time:

mouse_data %>% mutate(
  preference_obj1 = case_when(
    total_pref == 0 ~ NA,
    .default = preference_obj1),
  
  preference_obj2 = case_when(
    total_pref == 0 ~ NA,
    .default = preference_obj2),
) %>% 
  
  # show a subset to see that it's working
  filter(sid%in%c(180, 181)) %>%
  select(sid, time, contains("pref"))

# A tibble: 6 × 5
    sid time  preference_obj1 preference_obj2 total_pref
  <dbl> <chr>           <dbl>           <dbl>      <dbl>
1   180 tp1              NA              NA            0
2   180 tp2              59.0            41.0        100
3   180 tp3              63.2            36.8        100
4   181 tp1              NA              NA            0
5   181 tp2              52.5            47.5        100
6   181 tp3              75.6            24.4        100

Mutate both columns simultaneously (& save our changes)

# or both columns using across

mouse_data <- mouse_data %>%
  mutate(across(
    .cols = starts_with("pref"),
    .fns =  ~ case_when(
      total_pref == 0 ~ NA,
      .default = .)
    )) 


mouse_data %>%
  # show a subset to see that it's working
  filter(sid %in% c(180, 181)) %>%
  select(sid, time, contains("pref"))

# A tibble: 6 × 5
    sid time  preference_obj1 preference_obj2 total_pref
  <dbl> <chr>           <dbl>           <dbl>      <dbl>
1   180 tp1              NA              NA            0
2   180 tp2              59.0            41.0        100
3   180 tp3              63.2            36.8        100
4   181 tp1              NA              NA            0
5   181 tp2              52.5            47.5        100
6   181 tp3              75.6            24.4        100

# remove total_pref

mouse_data <- mouse_data %>% select(-total_pref)

6.3 Save the cleaned data as a file

Now that we have the data cleaned and joined,
- let’s save this as a cleaned data file in case we want to use it later.

csv file
Rdata file

Advantage: can look at the data in Excel
Disadvantage: we lose special “coding” of variables, such as factor variables (although we don’t have any yet)

write_excel_csv(
  mouse_data, 
  file = here::here("part6", "data", "mouse_data_longitudinal_clean.csv"))

Advantage: we don’t lose special “coding” of variables, such as factor variables (although we don’t have any yet)
Disadvantage : can’t look at the data in Excel

save(
  mouse_data, 
  file = here::here("part6", "data", "mouse_data_longitudinal_clean.Rdata"))

6.4 Challenge 4

Make that plot! Or get as close as you can.

Hint: you need to reshape your data and make some new columns out of old columns…

# make the data you need




# make your plot!

7 Create the figure

7.1 What do the data need to “look like” for the figure?

Our data
Variables needed to create the figure

Let’s look at what we have again:

glimpse(mouse_data)

Rows: 96
Columns: 18
$ sid                                <dbl> 137, 137, 137, 138, 138, 138, 139, …
$ strain                             <chr> "C3H", "C3H", "C3H", "C3H", "C3H", …
$ trt                                <chr> "-", "-", "-", "-", "-", "-", "-", …
$ sex                                <chr> "M", "M", "M", "M", "M", "M", "M", …
$ time                               <chr> "tp1", "tp2", "tp3", "tp1", "tp2", …
$ normalized_bdnf_amygdala_pg_mg     <dbl> 492.4831, 275.1623, NA, 453.6635, 4…
$ normalized_bdnf_cortex_pg_mg       <dbl> 720.0173, NA, 871.8286, 884.5668, N…
$ normalized_bdnf_hypothalamus_pg_mg <dbl> NA, 1169.2845, NA, 1215.8147, 1078.…
$ normalized_cd68_amygdala_pg_mg     <dbl> 988.9628, 574.0655, NA, 775.5970, 4…
$ normalized_cd68_cortex_pg_mg       <dbl> 8.393707, NA, NA, 7.901366, NA, 8.8…
$ normalized_cd68_hypothalamus_pg_mg <dbl> NA, 6800.870, NA, 4373.811, 4461.62…
$ normalized_map2_cortex_pg_mg       <dbl> 352.9653, NA, 2693.9386, 1007.4147,…
$ mirna1                             <dbl> 5.2630200, -0.0491371, -0.7367310, …
$ mirna2                             <dbl> 1.6536200, -0.0773419, 0.1479940, -…
$ learning_outcome                   <dbl> 3.52, 19.81, 2.44, 1.56, 14.48, 1.1…
$ preference_obj1                    <dbl> 41.72205, 37.51387, 55.96768, 74.11…
$ preference_obj2                    <dbl> 58.27795, 62.48613, 44.03232, 25.88…
$ time_month                         <fct> 1 month, 6 months, 12 months, 1 mon…

Our biomarker variables are in separate columns:
- normalized_bdnf_amygdala_pg_mg, normalized_bdnf_cortex_pg_mg, … , normalized_map2_cortex_pg_mg.
The plot has facets that correspond to
- biomarker (bdnf, cd68, map2) and
- tissue type (amygdala, cortex, hypothalamus).
Therefore, we need separate columns that contain this information, such as:
- biomarker_type = bdnf, cd68, map2, etc
- tissue_type = amygdala, cortex, hypothalamus, etc
- biomarker_value = the biomarker numeric value for each of these
Each mouse (sid) will have multiple rows,
- because they have multiple tissue samples and multiple biomarkers measured.
- This means we need “long” data, where each observation is in a separate row.

7.2 Make data long

First we will make the data long, and deal with the biomarker_type/tissue_type separation second:

# make a long dataset where we have multiple biomarker data in the same column
mouse_biomarker_long <- mouse_data %>%
  pivot_longer(cols = starts_with("normalized"),
              names_to = "biomarker_type_temp",
              values_to = "biomarker_value")

glimpse(mouse_biomarker_long) # helps to View it too

Rows: 672
Columns: 13
$ sid                 <dbl> 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, …
$ strain              <chr> "C3H", "C3H", "C3H", "C3H", "C3H", "C3H", "C3H", "…
$ trt                 <chr> "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", …
$ sex                 <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", …
$ time                <chr> "tp1", "tp1", "tp1", "tp1", "tp1", "tp1", "tp1", "…
$ mirna1              <dbl> 5.2630200, 5.2630200, 5.2630200, 5.2630200, 5.2630…
$ mirna2              <dbl> 1.6536200, 1.6536200, 1.6536200, 1.6536200, 1.6536…
$ learning_outcome    <dbl> 3.52, 3.52, 3.52, 3.52, 3.52, 3.52, 3.52, 19.81, 1…
$ preference_obj1     <dbl> 41.72205, 41.72205, 41.72205, 41.72205, 41.72205, …
$ preference_obj2     <dbl> 58.27795, 58.27795, 58.27795, 58.27795, 58.27795, …
$ time_month          <fct> 1 month, 1 month, 1 month, 1 month, 1 month, 1 mon…
$ biomarker_type_temp <chr> "normalized_bdnf_amygdala_pg_mg", "normalized_bdnf…
$ biomarker_value     <dbl> 492.483106, 720.017330, NA, 988.962849, 8.393707, …

We are getting there.

We now have the multiple observations per mouse (see sid 137 is repeated),
- and we have the biomarker_value column.
However, we need the biomarker_type_temp
- to be separated out into
  - the tissue type and
  - the name of the biomarker,
- for plotting purposes, and also to make it tidy!

7.3 Tidy up `biomarker_type_temp` column

Goal: separate biomarker_type_temp into 2 columns:
- the tissue type and
- the name of the biomarker

These are the values in the biomarker_type_temp column:

mouse_biomarker_long %>% tabyl(biomarker_type_temp)

                biomarker_type_temp  n   percent
     normalized_bdnf_amygdala_pg_mg 96 0.1428571
       normalized_bdnf_cortex_pg_mg 96 0.1428571
 normalized_bdnf_hypothalamus_pg_mg 96 0.1428571
     normalized_cd68_amygdala_pg_mg 96 0.1428571
       normalized_cd68_cortex_pg_mg 96 0.1428571
 normalized_cd68_hypothalamus_pg_mg 96 0.1428571
       normalized_map2_cortex_pg_mg 96 0.1428571

First, we can remove the “normalized” and “pg_mg” parts:

mouse_biomarker_long <- mouse_biomarker_long %>%
  mutate(
    biomarker_type_temp = str_remove_all(
      biomarker_type_temp, "normalized_|_pg_mg")
    )

# check:
mouse_biomarker_long %>% tabyl(biomarker_type_temp)

 biomarker_type_temp  n   percent
       bdnf_amygdala 96 0.1428571
         bdnf_cortex 96 0.1428571
   bdnf_hypothalamus 96 0.1428571
       cd68_amygdala 96 0.1428571
         cd68_cortex 96 0.1428571
   cd68_hypothalamus 96 0.1428571
         map2_cortex 96 0.1428571

Now we can separate using separate_wider_delim() with _ as the separator:

mouse_biomarker_long <- mouse_biomarker_long %>%
  separate_wider_delim(
    cols = biomarker_type_temp,
    names = c("biomarker_type", "biomarker_location"),
    delim = "_",
    cols_remove = FALSE)

glimpse(mouse_biomarker_long)

Rows: 672
Columns: 15
$ sid                 <dbl> 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, …
$ strain              <chr> "C3H", "C3H", "C3H", "C3H", "C3H", "C3H", "C3H", "…
$ trt                 <chr> "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", …
$ sex                 <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", …
$ time                <chr> "tp1", "tp1", "tp1", "tp1", "tp1", "tp1", "tp1", "…
$ mirna1              <dbl> 5.2630200, 5.2630200, 5.2630200, 5.2630200, 5.2630…
$ mirna2              <dbl> 1.6536200, 1.6536200, 1.6536200, 1.6536200, 1.6536…
$ learning_outcome    <dbl> 3.52, 3.52, 3.52, 3.52, 3.52, 3.52, 3.52, 19.81, 1…
$ preference_obj1     <dbl> 41.72205, 41.72205, 41.72205, 41.72205, 41.72205, …
$ preference_obj2     <dbl> 58.27795, 58.27795, 58.27795, 58.27795, 58.27795, …
$ time_month          <fct> 1 month, 1 month, 1 month, 1 month, 1 month, 1 mon…
$ biomarker_type      <chr> "bdnf", "bdnf", "bdnf", "cd68", "cd68", "cd68", "m…
$ biomarker_location  <chr> "amygdala", "cortex", "hypothalamus", "amygdala", …
$ biomarker_type_temp <chr> "bdnf_amygdala", "bdnf_cortex", "bdnf_hypothalamus…
$ biomarker_value     <dbl> 492.483106, 720.017330, NA, 988.962849, 8.393707, …

# Check!
mouse_biomarker_long %>% 
  tabyl(biomarker_type_temp, biomarker_type) %>% 
  adorn_title()

                     biomarker_type          
 biomarker_type_temp           bdnf cd68 map2
       bdnf_amygdala             96    0    0
         bdnf_cortex             96    0    0
   bdnf_hypothalamus             96    0    0
       cd68_amygdala              0   96    0
         cd68_cortex              0   96    0
   cd68_hypothalamus              0   96    0
         map2_cortex              0    0   96

mouse_biomarker_long %>% 
  tabyl(biomarker_type_temp, biomarker_location) %>% 
  adorn_title()

                     biomarker_location                    
 biomarker_type_temp           amygdala cortex hypothalamus
       bdnf_amygdala                 96      0            0
         bdnf_cortex                  0     96            0
   bdnf_hypothalamus                  0      0           96
       cd68_amygdala                 96      0            0
         cd68_cortex                  0     96            0
   cd68_hypothalamus                  0      0           96
         map2_cortex                  0     96            0

7.4 Make the plot!

biomarker_boxplot <- ggplot(
  mouse_biomarker_long,
  aes(x = biomarker_location,
      y = biomarker_value,
      fill = trt)) +
  geom_boxplot() +
  facet_grid(
    cols = vars(time_month),
    rows = vars(biomarker_type),
    scales = "free_y") +
  theme_bw() +
  labs(x = "Biomarker Tissue",
       y = "Biomarker Value (pg/mg)",
       fill = "Treatment") +
  ggthemes::scale_fill_tableau()

8 Save workspace as Rdata file

We’re going to pick up in Part 7 where we left off here.
In order to do that seamlessly and make sure we have all the objects in our workspace (environment) to access, we can save all the objects in our workspace in one .Rdata file

save.image(file = here::here("part6", "workspace_part6.Rdata"))

9 Package versions

I recommend adding the code below to the end of your files so that in the future you have a record of what versions of packages were used in your work.

devtools::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.3 (2023-03-15)
 os       macOS 14.8.3
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Los_Angeles
 date     2026-02-12
 pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package        * version  date (UTC) lib source
 abind            1.4-5    2016-07-21 [1] CRAN (R 4.2.0)
 backports        1.4.1    2021-12-13 [1] CRAN (R 4.2.0)
 base64enc        0.1-3    2015-07-28 [1] CRAN (R 4.2.0)
 bit              4.0.5    2022-11-15 [1] CRAN (R 4.2.0)
 bit64            4.0.5    2020-08-30 [1] CRAN (R 4.2.0)
 broom            1.0.7    2024-09-26 [1] CRAN (R 4.2.3)
 cachem           1.1.0    2024-05-16 [1] CRAN (R 4.2.3)
 car              3.1-2    2023-03-30 [1] CRAN (R 4.2.3)
 carData          3.0-5    2022-01-06 [1] CRAN (R 4.2.0)
 cellranger       1.1.0    2016-07-27 [1] CRAN (R 4.2.0)
 cli              3.6.3    2024-06-21 [1] CRAN (R 4.2.3)
 crayon           1.5.3    2024-06-20 [1] CRAN (R 4.2.3)
 devtools         2.4.5    2022-10-11 [1] CRAN (R 4.2.0)
 digest           0.6.37   2024-08-19 [1] CRAN (R 4.2.3)
 dplyr          * 1.1.4    2023-11-17 [1] CRAN (R 4.2.3)
 ellipsis         0.3.2    2021-04-29 [1] CRAN (R 4.2.0)
 evaluate         1.0.1    2024-10-10 [1] CRAN (R 4.2.3)
 farver           2.1.2    2024-05-13 [1] CRAN (R 4.2.3)
 fastmap          1.2.0    2024-05-15 [1] CRAN (R 4.2.3)
 forcats        * 1.0.0    2023-01-29 [1] CRAN (R 4.2.0)
 fs               1.6.5    2024-10-30 [1] CRAN (R 4.2.3)
 generics         0.1.3    2022-07-05 [1] CRAN (R 4.2.0)
 ggplot2        * 4.0.2    2026-02-03 [1] CRAN (R 4.2.3)
 ggthemes       * 4.2.4    2021-01-20 [1] CRAN (R 4.2.0)
 ghibli         * 0.3.3    2022-08-26 [1] CRAN (R 4.2.0)
 glue             1.8.0    2024-09-30 [1] CRAN (R 4.2.3)
 gt             * 1.0.0    2025-04-05 [1] CRAN (R 4.2.3)
 gtable           0.3.6    2024-10-25 [1] CRAN (R 4.2.3)
 gtsummary      * 2.0.4    2024-11-30 [1] CRAN (R 4.2.3)
 here           * 1.0.1    2020-12-13 [1] CRAN (R 4.2.0)
 hms              1.1.3    2023-03-21 [1] CRAN (R 4.2.0)
 htmltools        0.5.8.1  2024-04-04 [1] CRAN (R 4.2.3)
 htmlwidgets      1.6.4    2023-12-06 [1] CRAN (R 4.2.3)
 httpuv           1.6.15   2024-03-26 [1] CRAN (R 4.2.3)
 janitor        * 2.2.0    2023-02-02 [1] CRAN (R 4.2.0)
 jsonlite         1.8.9    2024-09-20 [1] CRAN (R 4.2.3)
 knitr            1.49     2024-11-08 [1] CRAN (R 4.2.3)
 labeling         0.4.3    2023-08-29 [1] CRAN (R 4.2.0)
 later            1.4.1    2024-11-27 [1] CRAN (R 4.2.3)
 lifecycle        1.0.4    2023-11-07 [1] CRAN (R 4.2.3)
 lubridate      * 1.9.3    2023-09-27 [1] CRAN (R 4.2.0)
 magrittr         2.0.3    2022-03-30 [1] CRAN (R 4.2.0)
 memoise          2.0.1    2021-11-26 [1] CRAN (R 4.2.0)
 mime             0.12     2021-09-28 [1] CRAN (R 4.2.0)
 miniUI           0.1.1.1  2018-05-18 [1] CRAN (R 4.2.0)
 naniar         * 1.1.0    2024-03-05 [1] CRAN (R 4.2.3)
 pacman           0.5.1    2019-03-11 [1] CRAN (R 4.2.0)
 paletteer      * 1.6.0    2024-01-21 [1] CRAN (R 4.2.3)
 palmerpenguins * 0.1.1    2022-08-15 [1] CRAN (R 4.2.0)
 pillar           1.10.0   2024-12-17 [1] CRAN (R 4.2.3)
 pkgbuild         1.4.5    2024-10-28 [1] CRAN (R 4.2.3)
 pkgconfig        2.0.3    2019-09-22 [1] CRAN (R 4.2.0)
 pkgload          1.4.0    2024-06-28 [1] CRAN (R 4.2.3)
 prismatic        1.1.1    2022-08-15 [1] CRAN (R 4.2.0)
 profvis          0.4.0    2024-09-20 [1] CRAN (R 4.2.3)
 promises         1.3.2    2024-11-28 [1] CRAN (R 4.2.3)
 purrr          * 1.0.2    2023-08-10 [1] CRAN (R 4.2.0)
 R6               2.5.1    2021-08-19 [1] CRAN (R 4.2.0)
 RColorBrewer     1.1-3    2022-04-03 [1] CRAN (R 4.2.0)
 Rcpp             1.0.13-1 2024-11-02 [1] CRAN (R 4.2.3)
 readr          * 2.1.5    2024-01-10 [1] CRAN (R 4.2.3)
 readxl         * 1.4.2    2023-02-09 [1] CRAN (R 4.2.0)
 rematch2         2.1.2    2020-05-01 [1] CRAN (R 4.2.0)
 remotes          2.5.0    2024-03-17 [1] CRAN (R 4.2.3)
 repr             1.1.6    2023-01-26 [1] CRAN (R 4.2.0)
 rlang            1.1.4    2024-06-04 [1] CRAN (R 4.2.3)
 rmarkdown        2.29     2024-11-04 [1] CRAN (R 4.2.3)
 rprojroot        2.0.4    2023-11-05 [1] CRAN (R 4.2.0)
 rstatix        * 0.7.2    2023-02-01 [1] CRAN (R 4.2.0)
 rstudioapi       0.17.1   2024-10-22 [1] CRAN (R 4.2.3)
 S7               0.2.1    2025-11-14 [1] CRAN (R 4.2.3)
 sass             0.4.9    2024-03-15 [1] CRAN (R 4.2.3)
 scales           1.4.0    2025-04-24 [1] CRAN (R 4.2.3)
 sessioninfo      1.2.2    2021-12-06 [1] CRAN (R 4.2.0)
 shiny            1.10.0   2024-12-14 [1] CRAN (R 4.2.3)
 skimr          * 2.1.5    2022-12-23 [1] CRAN (R 4.2.0)
 snakecase        0.11.0   2019-05-25 [1] CRAN (R 4.2.0)
 stringi          1.8.4    2024-05-06 [1] CRAN (R 4.2.3)
 stringr        * 1.6.0    2025-11-04 [1] CRAN (R 4.2.3)
 tibble         * 3.2.1    2023-03-20 [1] CRAN (R 4.2.0)
 tidyr          * 1.3.1    2024-01-24 [1] CRAN (R 4.2.3)
 tidyselect       1.2.1    2024-03-11 [1] CRAN (R 4.2.3)
 tidyverse      * 2.0.0    2023-02-22 [1] CRAN (R 4.2.0)
 timechange       0.3.0    2024-01-18 [1] CRAN (R 4.2.3)
 tzdb             0.4.0    2023-05-12 [1] CRAN (R 4.2.0)
 urlchecker       1.0.1    2021-11-30 [1] CRAN (R 4.2.0)
 usethis          3.1.0    2024-11-26 [1] CRAN (R 4.2.3)
 utf8             1.2.4    2023-10-22 [1] CRAN (R 4.2.0)
 vctrs            0.6.5    2023-12-01 [1] CRAN (R 4.2.3)
 vembedr        * 0.1.5    2021-12-11 [1] CRAN (R 4.2.0)
 visdat           0.6.0    2023-02-02 [1] CRAN (R 4.2.0)
 vroom            1.6.5    2023-12-05 [1] CRAN (R 4.2.3)
 withr            3.0.2    2024-10-28 [1] CRAN (R 4.2.3)
 xfun             0.49     2024-10-31 [1] CRAN (R 4.2.3)
 xml2             1.3.6    2023-12-04 [1] CRAN (R 4.2.3)
 xtable           1.8-4    2019-04-21 [1] CRAN (R 4.2.0)
 yaml             2.3.10   2024-07-26 [1] CRAN (R 4.2.3)

 [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────

10 Post Class Survey

Please fill out the post-class survey.

Your responses are anonymous in that I separate your names from the survey answers before compiling/reading.

You may want to review previous years’ feedback here.

11 Acknowledgements

Part 6 is based on the BSTA 505 Winter 2023 course, taught by Jessica Minnier.
- I made modifications to update the material from RMarkdown to Quarto, and streamlined/edited content for slides.
- Also made changes such as using the newseparate_wider_delim(), using n_distinct(), and saving .Rdata files.
Minnier’s Acknowledgements:
- Written by Jessica Minnier and inspired by work of Ted Laderas.