Week 3

Errors, more data loading, data manipulation, ggplot themes, factors
Published

January 24, 2024

Modified

February 13, 2024

Topics

  • Where to get help on errors
  • Revisiting data loading with the here package
  • Data manipulation with dplyr package
  • Themes in the ggplot2 package
  • Factors
  • Boxplots and facets

Announcements

  • Class materials for BSTA 526 will be provided in the shared OneDrive folder BSTA_526_W24_class_materials_public.
  • For today’s class, make sure to download to your computer the folder called part_03, and then open RStudio by double-clicking on the file called part_03.Rproj.
  • If you have not already done so, please join the BSTA 526 Slack channel and introduce yourself by posting in the #random channel.

Class materials

Post-class survey

Homework

  • See OneDrive folder for homework assignment.
  • HW 3 due on 1/31.

Recording

  • In-class recording links are on Sakai. Navigate to Course Materials -> Schedule with links to in-class recordings. Note that the password to the recordings is at the top of the page.

Feedback from post-class surveys

Week 3 feedback

Muddiest points

here package

The here package takes a bit to explaining, but, compared to the old way of doing things, it is a real life saver. The issue in the past had to do with relative file paths, especially with .qmd files that are saved in sub-folders. The .qmd file recognizes where it is saved as the root file path, which is okay with a one-off .qmd file. But when working in projects (recommended) and striving for reproducible R code (highly recommended), the here package save a lot of headache.

For further reading: + Why should I use the here package when I’m already using projects? by Malcolm Barrett. + how to use the here package by Jenny Richmond. + here package vignette + Using here with rmarkdown

Project-oriented workflows are recommended. Here package solves some old headaches. It gets easier with practice.

Question about using here

… how [here] can be used in certain instances where one may not remember if they switched to a new qmd file? In that case, would you suggest to use the “here” command each time you work on a project where there’s a chance that you’ll switch between qmd files and would like to use the same data file throughout? Is there any other way to better use this function or tips on how you deal with it?

There is a difference between working interactively in RStudio where data are loaded to the Environment. In this case, loading a data set once means that it can be used in any other code while working in the environment.

Issues will com up when you go to render a .qmd that doesn’t have the data loaded within that .qmd. It won’t look to the environment for the data; it looks to the filepath that you specify in the .qmd. Best practice is to write the code to load the data in each .qmd or .R script so that R knows where to look for the data that you want it to operate on / analyze.

The ! function. It seems like sometimes we use ! and sometimes we use -. Are they interchangeable, or each with different types of functions?

  • ! – the exclamation point can be read as “not” it is primarily used in logical statements
  • - – the minus sign can be used in more instances
    • to do actual arithmetic (i.e. subtraction)
    • to indicate a negative number
    • with dplyr::select() to remove or not select a column, or exclusion
# Subtraction
5 - 3
[1] 2
# Negation
x <- 10
-x
[1] -10
# Selection/exclusion
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
select(starwars, -height) |> dplyr::glimpse()
Rows: 87
Columns: 13
$ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
$ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
$ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
$ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
$ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",…
$ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
$ sex        <chr> "male", "none", "none", "male", "female", "male", "female",…
$ gender     <chr> "masculine", "masculine", "masculine", "masculine", "femini…
$ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T…
$ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma…
$ films      <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return…
$ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp…
$ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",…

Using the fill command

We didn’t cover it in the lecture notes, but then it appeared in the example. I suggest to read/work through the fill vignette; the examples there are good ones to show what the function does. Then look back a the smoke_messy data set in Part 3 and think about why this command would be useful to clean up the data and for filling in missing values.

Loading data into R

It gets easier and hopefully you get to see more example in the notes and practice with the homework. This tutorial is pretty good. So is the readxl vignette and the readr vignette.

Reasonable width, height, and dpi values when using ggsave

This takes some trial and error and depends on the purpose. For draft figures, dpi = 70 might be okay, but a journal might require dpi above 300 for publication. In Quarto, rendering an html, the figure defaults are 7x5 inches (Link). We talked about in class how you can use the plot panes to size your figures by trial and error.

The tidyselect section

There were pretty good resources in the notes

  • See some more examples in this slide

  • For more info and learning about tidyselect, please run this code in your console:

# install remotes package
install.packages("remotes")
# use remotes to install this package from github
remotes::install_github("laderast/tidyowl")

# load tidyowl package
library(tidyowl)

# interactive tutorial
tidyowl::learn_tidyselect()

Here is also a link with a list of the selectors and links to each one. For example, there is a link to starts_with and a bunch of examples.