dplyr::slice_sample

Function of the Week

Update with brief descirption of function
Author

Cirell Alfonso

Published

Invalid Date

1 Submission Instructions

Please sign up for a function here (Enter your name and the week you want to present): function_of_the_week_signup_2024

For this assignment, please submit both the .qmd and the .html files. I will add it to the website. Remove your name from the qmd if you do not wish it shared or let us know if it is okay to post in anonymously.

Make sure to update the title, description, author, and date in the yaml above.

Previous years’ Functions of the Week can be found on the previous class websites:

If you select a function which was presented previously, please develop your own examples and content.

2 slice_sample( )

In this document, I will introduce the slice_sample( ) function and show what it’s for.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data(mtcars)

2.1 What is it for?

Slice_sample( ) from the dplyr package randomly selects a row from a dataset.

slice_sample(mtcars)
          mpg cyl  disp hp drat   wt qsec vs am gear carb
Merc 230 22.8   4 140.8 95 3.92 3.15 22.9  1  0    4    2

There are various useful arguments in the slice_sample( ) function. We can add n to select a set number of rows.

slice_sample(mtcars,n=5)
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Merc 450SLC    15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2

We can also use prop if we want to sample a proportion of samples available in the data set.

slice_sample(mtcars, prop=0.25)
                  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Merc 450SE       16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Lotus Europa     30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L   15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino     19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Toyota Corona    21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Mazda RX4        21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Mazda RX4 Wag    21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4

The default has it where replacement is false.

slice_sample(mtcars, n=6,
             replace= TRUE)
              mpg cyl  disp  hp drat   wt  qsec vs am gear carb
Fiat 128...1 32.4   4  78.7  66 4.08 2.20 19.47  1  1    4    1
Merc 230     22.8   4 140.8  95 3.92 3.15 22.90  1  0    4    2
Duster 360   14.3   8 360.0 245 3.21 3.57 15.84  0  0    3    4
Camaro Z28   13.3   8 350.0 245 3.73 3.84 15.41  0  0    3    4
Datsun 710   22.8   4 108.0  93 3.85 2.32 18.61  1  1    4    1
Fiat 128...6 32.4   4  78.7  66 4.08 2.20 19.47  1  1    4    1

Lastly, we can add weight_by to add sampling weights to any non-negative vectors.

slice_sample(mtcars, n=5, weight_by = wt)
             mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Merc 240D   24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 280C   17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Valiant     18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280    19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

2.2 Is it helpful?

This function is definitely helpful. In larger data set, we can get a smaller random sample fairly easily.