Week 2
Projects, data frames, reading in data, visualizing data with ggplot2
Pre-recorded videos
- Links to pre-recorded videos are posted on Sakai, in the table with the links to live class recordings.
- I split the class into 3 recrodings. See the table for details.
Topics
- Projects
- Data frames
- Tidy data
- Reading in data
- Getting to know a dataset
- Visualizing data with ggplot2 (intro)
Announcements
- Class materials for BSTA 526 will be provided in the shared OneDrive folder BSTA_526_W24_class_materials_public.
- For today’s class, make sure to download to your computer the folder called
part_02, and then open RStudio by double-clicking on the file calledpart_02.Rproj. - If you have not already done so, please join the BSTA 526 Slack channel and introduce yourself by posting in the
#randomchannel.
Class materials
- Readings
- One Drive part_02 Project folder
Post-class survey
- Please fill out the post-class survey to provide feedback. Thank you!
Homework
- See OneDrive folder for homework assignment.
- HW 2 due on 1/29 (updated to Monday).
Recording
- In-class recording links are on Sakai. Navigate to Course Materials -> Schedule with links to in-class recordings. Note that the password to the recordings is at the top of the page.
Muddiest points
- When discussing untidy data, the difference between long data and wide data was unclear.
- We’ll be discussing the difference between long and wide data in more detail later in the course when we convert a dataset between the two. For now, you can take a look at an example I created for our BERD R workshops. The wide data in that example are not “tidy” since each cell contains two pieces of information: both the SBP and the visit number. In contrast, the long data have a separate column indicating which visit number the data in a given row are from.
- for the “summary()” function, is there a way to summarize all but one variable in a dataset?
- Yes! I sometimes restrict a dataset to a couple of variables for which I want to see the summary. I usually use the
select()function for this, which we will be covering later in the course. For now, you can take a look at some select() examples from the BERD R workshops (see slides 29-32).
- Yes! I sometimes restrict a dataset to a couple of variables for which I want to see the summary. I usually use the
- Differences between a tibble and a data.frame
- I’m not surprised to see this show up as a muddiest point! Depending on your level of experience with R, at this point in the class some of the differences are difficult to explain since we haven’t done much coding yet. The tibble vignette lists some of the differences though if you are interested. For our purposes, they are almost the same thing. When some differences come up later in the course, I will point them out.
Clearest Points
Thanks for the feedback!
- I enjoyed going through the code and viewing the functions. I haven’t really used
skimrbefore and that was nice to see.- I like using
skmir, but have recently been usingget_summary_stats()from therstatixpackage when teaching. It is only for numeric variables though. See aget_summary_stats()example from my BSTA 511 class.
- I like using
- Loading data.
- How to load data into R was clearest.
- Good to know that loading data was clear. This part can be tricky sometimes!
- ggplot
- Hopefully this will still be clear when we cover more advanced options in
ggplot!
- Hopefully this will still be clear when we cover more advanced options in