This is a reminder that 5% of your grade is based on filling out post-class surveys as a way of telling us that you came to class and engaged with the material for that week.
You only need to fill out 5 surveys (of the 10 class sessions) for the full 5%. We encourage you to fill out as many surveys as possible to provide feedback on the class though.
Please fill out surveys by 8 pm on Sunday evenings to guarantee that they will be counted. We usually download them some time on Sunday evening or Monday. If you turn it in before we download the responses, it will get counted.
Please fill out the post-class survey to provide feedback. Thank you!
Homework
See OneDrive folder for homework assignment.
HW 9 due on 3/13. Assignment is in the part 7 folder.
Recording
In-class recording links are on Sakai. Navigate to Course Materials -> Schedule with links to in-class recordings. Note that the password to the recordings is at the top of the page.
Not entirely sure how to read or make sense of matrices yet (maybe I should have payed more attention in algebra), like when we saw the structure of a matrix here in the class script: str(output_model$coefficients)
In R, matrices are two-dimensional data structures that can store elements of the same data type. They are similar to vectors but have two dimensions (rows and columns). They are widely used in various statistical and mathematical operations, making them a fundamental data structure in the R.
Basic way to create matrices
# Create a matrix with values filled column-wise(mat1 <-matrix(1:6, nrow =2, ncol =3, byrow =FALSE))
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
# Create a matrix with values filled row-wise(mat2 <-matrix(1:6, nrow =2, ncol =3, byrow =TRUE))
# Accessing entire row or columnrow_vector <- mat1[1, ] # Entire first rowrow_vector
[1] 1 3 5
col_vector <- mat1[, 2] # Entire second columncol_vector
[1] 3 4
Convert to data.frame
as.data.frame(mat1)
V1 V2 V3
1 1 3 5
2 2 4 6
library(tibble)tibble::as_tibble(mat1)
Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
`.name_repair` is omitted as of tibble 2.0.0.
ℹ Using compatibility `.name_repair`.
# You can also name the columns (and the rows)colnames(mat1) <-c("a", "b", "c")mat1
a b c
[1,] 1 3 5
[2,] 2 4 6
tibble::as_tibble(mat1)
# A tibble: 2 × 3
a b c
<int> <int> <int>
1 1 3 5
2 2 4 6
for() loops
Still a little confused about the for() loops…
For loops are a staple in programming languages, not just R. They are used when we want to repeat the same operation (or a set of operations) several times.
The basis syntax in R looks like:
for (variable in sequence) {# Statements to be executed for each iteration}
Here’s a breakdown of the components:
variable: This is a loop variable that takes on each value in the specified sequence during each iteration of the loop.
sequence: This is the sequence of values over which the loop iterates. It can be a vector, list, or any other iterable object.
Loop Body: The statements enclosed within the curly braces {} constitute the body of the loop. These statements are executed for each iteration of the loop.
Using a copy and paste method to calculate the mean of each column would look something like this:
median(df$a)
[1] -0.5388446
median(df$b)
[1] 0.06331579
median(df$c)
[1] 0.5253658
median(df$d)
[1] 1.030272
But this breaks the rule of DRY (“Don’t repeat yourself”)
output <-c() # vector to store the results of the for loopfor (i inseq_along(df)) { output[i] <-median(df[[i]])}output
[1] -0.53884460 0.06331579 0.52536580 1.03027185
For loops in R are commonly used when you know the number of iterations in advance or when you need to iterate over a specific sequence of values. While for loops are useful, R also provides other ways to perform iteration, such as using vectorized operations (example below) and functions from the apply family (not covered). It’s often recommended to explore these alternatives when working with R for better code efficiency and readability.
# With a for loopresult_addition_for_loop <-c()for (i in1:length(vector1)) { result_addition_for_loop[i] <- vector1[i] + vector2[i]}result_addition_for_loop
[1] 7 9 11 13 15
na.rm vs na.omit
Is there a difference between na.rm and na.omit?
Yes, there is a difference. In R, they are used in different context.
na.rm (Remove)
na.rm is an argument found in various functions (e.g. mean(), sum(), etc.) that allows you to specify whether missing values (NA or NaN) should be removed before performing the calculation.
From the help for mean() (?mean): a logical evaluating to TRUE or FALSE indicating whether NA values should be stripped before the computation proceeds.
# A vector with NA valuesvalues_with_na <-c(1, 2, 3, NA, 5)mean(values_with_na, na.rm =FALSE) # Result will be NA
[1] NA
# Excluding NA valuesmean(values_with_na, na.rm =TRUE) # Result will be (1+2+3+5)/4 = 2.75
[1] 2.75
na.omit (Omit missing)
na.omit is a function that can be used to remove rows with missing values (NA) from a data frame or matrix.
# Creating a data frame with NA valuesdf <-data.frame(A =c(1, 2, NA, 4), B =c(5, NA, 7, 8))# NAs in the columns of the data framedf
A B
1 1 5
2 2 NA
3 NA 7
4 4 8
# Using na.omit to remove rows with NA valuesdf |>na.omit()
A B
1 1 5
4 4 8
purrr::map()
I am still a little foggy on the formatting of purrrmap and how to utilize it effectively.
The purrr::map function is used to apply a specified function to each element of a list or vector, returning the results in a new list.
Basic Syntax:
purrr::map(.x, .f, ...)
.x: The input list or vector.
.f: The function to apply to each element of .x.
...: Additional arguments passed to the function specified in .f.
Key Features:
Consistent Output:
map returns a list, ensuring a consistent output format regardless of the input structure.
Function Application:
The primary purpose is to apply a specified function to each element of the input .x.
Formula Interface:
Supports a formula interface (~) for concise function specifications.
purrr::map(.x, ~ function(.))
Example:
# Sample listmy_list <-list(a =1:3, b =c(4, 5, 6), c =rnorm(n =3))my_list
In this example, the map function applies the squaring function (~ .x ^ 2) to each element of the input list my_list. The resulting squared_list is a list where each element is the squared version of the corresponding element in my_list.
The purrr::map function is particularly useful when working with lists and helps to create cleaner and more readable code, especially in cases where you want to apply the same operation to each element of a collection.
General references
Is there a good dictionary type document with “R language” or very basic function descriptions? … find it difficult to know what functions I need because it is hard to recall their name or confuse it with a different function.
R Documentation (Built-in Help): R itself provides built-in documentation that you can access using the help() function or the ? operator. For example, to get help on the mean() function, you can type help(mean) or ?mean in the R console.
R Manuals and Guides: The official R documentation, including manuals and guides, is available on the R Project website: R Manuals.
R Packages Documentation: Many R packages come with detailed documentation. You can find documentation for a specific package by visiting the CRAN website (Comprehensive R Archive Network) and searching for the package of interest.
Online Resources: Websites like RDocumentation provide a searchable database of R functions along with their documentation. You can search for a specific function and find details on its usage and parameters.