Submission Instructions
Please sign up for a function here (Enter your name and the week you want to present): function_of_the_week_signup_2024
For this assignment, please submit both the .qmd and the .html files. I will add it to the website. Remove your name from the qmd if you do not wish it shared or let us know if it is okay to post in anonymously.
Make sure to update the title, description, author, and date in the yaml above.
Previous years’ Functions of the Week can be found on the previous class websites:
If you select a function which was presented previously, please develop your own examples and content.
slice_sample( )
In this document, I will introduce the slice_sample( ) function and show what it’s for.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
What is it for?
Slice_sample( ) from the dplyr package randomly selects a row from a dataset.
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
There are various useful arguments in the slice_sample( ) function. We can add n to select a set number of rows.
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
We can also use prop if we want to sample a proportion of samples available in the data set.
slice_sample (mtcars, prop= 0.25 )
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
The default has it where replacement is false.
slice_sample (mtcars, n= 6 ,
replace= TRUE )
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128...1 32.4 4 78.7 66 4.08 2.20 19.47 1 1 4 1
Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.84 15.41 0 0 3 4
Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
Fiat 128...6 32.4 4 78.7 66 4.08 2.20 19.47 1 1 4 1
Lastly, we can add weight_by to add sampling weights to any non-negative vectors.
slice_sample (mtcars, n= 5 , weight_by = wt)
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Is it helpful?
This function is definitely helpful. In larger data set, we can get a smaller random sample fairly easily.