Week 5
Merging multiple datasets; cleaning & reshaping data
Topics
Part 5:
- Learn and apply loading comma separated and tab separated datasets using the
readrpackage - Learn different techniques for cleaning data using functions from the
tidyr,forcats, andstringrpackages- Practice cleaning data with a real dataset
- Learn and apply
bind_rows()to combine rows from two or more datasets - Learn about ways to merge columns from different datasets
- Apply
inner_join()andleft_join()to merge columns from different datasets
- Apply
- Learn about wide vs long data and how to reshape data
- apply
pivot_longer()to make a wide dataset long
- apply
Announcements
- Functions of the Week
- Signup sheet for your functions of the week presentations. The file is in OneDrive in the functions_of_the_week folder.
- Presentations will be during weeks 5-10. Please no more than 4 presentations per week.
- If there is a function you are interested in presenting that is not on the signup sheet, please check with me. If it hasn’t been covered before and isn’t covered in the class, then most likely I will approve it.
- The Midterm is posted on OneDrive. It is due Sunday 2/22/26.
- Please start early on this since finding a suitable dataset might take some time.
- I encourage you to meet with me to discuss your research question and data, to make sure you are on the right track.
- Cascadia R Conf in June 26-27 this year. It will be held at OHSU in RLSB. This is a great conference to meet other R enthusiasts in the area and learn more about what they are working on.

Class materials
- Class materials in OneDrive folder BSTA_526_W26_class_materials_public.
- For today’s class, make sure to download to your computer the folder called
part5. - Open RStudio by double-clicking on the project file called
BSTA_526_W26_class_materials_public.Rprojin the main OneDrive folder.
| Part | OneDrive folder | Slides | Webpage |
|---|---|---|---|
| 5 |
Readings
R4DS = R for Data Science (2e)
Required
- R4DS book:
- Modifying factor levels: Section 16.5
- In particular
fct_collapse(). In part 6 we will coverfct_recode()
- In particular
- Separating into columns (
separate_wider_delim()): Section 14.4.2 - Making numbers (
parse_number()): Section 13.2 - Joins: Chapter 19
- Lengthening and widening data (
pivot_longer()andpivot_wider()): Sections 5.3 and 5.4
- Modifying factor levels: Section 16.5
Optional
- Feel like the cat that got the cream with {forcats} - a great read on the
forcatspackage- Some of this is a review.New functions in part 5:
fct_collapseandfct_otherand
- There are many other great
forcatsfunctions covered here, some of which will be presented in part 6
- Some of this is a review.New functions in part 5:
- Pivoting vignette
- This is a great supplement to the R4DS chapter 5 sections linked to in the required readings above. It has some advanced examples that we will not be covering in class but we frequently encounter in practice.
- Regular expressions: R4DS book Section 15
- This goes into much more detail than we will be covering in BSTA 526. However, it’s a great resource for learning more about regular expressions and using them in R if you are interested.
- In part 5, we will be covering just
str_remove_all() - For now, I recommend at least a quick skim of this section so that you are aware of what we mean by “regular expressions” and how they can be used in data cleaning. Figuring out the details on how to use the more advanced examples can be postponed until you have need for them.
- Are you ready to learn more about Quarto?
- The following resources are optional readings and will not be covered in the class materials.
- They are quite helpful if you are ready to explore Quarto’s many possibilities
- R4DS Chapter 28: Quarto
- Quarto Cheatsheet
- Quarto Guide
- Quarto Reference
Post-class survey
- Please fill out the post-class survey to provide feedback. Thank you!
- Previous muddiest points and clearest points with responses are collected here.
Homework
- See OneDrive folder for homework assignment.
- HW 5 due on 02/05.
Recording
- In-class recording links are on Sakai. Navigate to Course Materials -> Schedule with links to in-class recordings.