dplyr::slice_max, slice_min

Function of the Week

Two of several different slice functions, all of which allow you to select specific rows in order to view, delete, mutate, or otherwise interact with them
Author

Anneka Sonstroem

Published

February 6, 2025

1 slice_max() and slice_min()

In this document, I will introduce the slice_max and slice_min functions and show what they’re for.

1.1 What is it for?

slice_max() and slice_min() are two of several different slice functions, all of which allow you to select specific rows in order to view, delete, mutate, or otherwise interact with them. slice_max() selects the rows with the highest values of a particular variable, and slice_min() selects the rows with the lowest values.

1.1.1 Syntax

The necessary arguments are your data frame and order_by, which specifies the variable to select the highest and lowest values from

The optional arguments include:

  • n, which specifies the number of rows to select, or prop, which specifies a proportion of rows. The default value is n=1.

  • with_ties, which specifies whether or not to include ties. The default value is TRUE, which means that the function may return more rows than requested if there are ties.

1.1.2 Examples

1.1.2.1 View a subset of your data

Which are the 5 penguins with the longest beaks?

slice_max(penguins, order_by=bill_length_mm, n=5)
# A tibble: 5 × 8
  species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
1 Gentoo    Biscoe           59.6          17                 230        6050
2 Chinstrap Dream            58            17.8               181        3700
3 Gentoo    Biscoe           55.9          17                 228        5600
4 Chinstrap Dream            55.8          19.8               207        4000
5 Gentoo    Biscoe           55.1          16                 230        5850
# ℹ 2 more variables: sex <fct>, year <int>

1.1.2.2 Subsetting quantiles

Which penguins are in the lowest quartile of beak length?

penguins %>% slice_min(bill_length_mm, prop=0.25)
# A tibble: 86 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Dream               32.1          15.5               188        3050
 2 Adelie  Dream               33.1          16.1               178        2900
 3 Adelie  Torgersen           33.5          19                 190        3600
 4 Adelie  Dream               34            17.1               185        3400
 5 Adelie  Torgersen           34.1          18.1               193        3475
 6 Adelie  Torgersen           34.4          18.4               184        3325
 7 Adelie  Biscoe              34.5          18.1               187        2900
 8 Adelie  Torgersen           34.6          21.1               198        4400
 9 Adelie  Torgersen           34.6          17.2               189        3200
10 Adelie  Biscoe              35            17.9               190        3450
# ℹ 76 more rows
# ℹ 2 more variables: sex <fct>, year <int>

1.1.2.3 Creating new data frames and variables

What is the average beak length for the penguins in the lowest quartile of beak length?

shortbeak <- slice_min(penguins, order_by=bill_length_mm, prop=0.25)
mean(shortbeak$bill_length_mm)
[1] 36.9186

1.1.3 Weird examples

1.1.3.1 What happens if you try to use slice_max() or slice_min() with a non-numeric variable?

penguins %>% slice_min(species, n=5)
# A tibble: 152 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 142 more rows
# ℹ 2 more variables: sex <fct>, year <int>

The values will be sorted alphabetically! This seems less useful than using the functions for numeric values, but it would allow you to do things like pull the first 5 people alphabetically from a list.

1.1.3.2 What happens if you set ties to FALSE even though there are lots of ties?

As we saw with the species example above, I requested n=5 but got far more rows than that. If I’d set with_ties to false, I would have gotten…

penguins %>% slice_min(species, n=5, with_ties=FALSE)
# A tibble: 5 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
# ℹ 2 more variables: sex <fct>, year <int>

… the five Adelie penguins that happen to be listed first in the data frame.

1.1.3.3 What happens if you set prop>=1?

penguins %>% slice_min(bill_length_mm, prop=1.3)
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Dream               32.1          15.5               188        3050
 2 Adelie  Dream               33.1          16.1               178        2900
 3 Adelie  Torgersen           33.5          19                 190        3600
 4 Adelie  Dream               34            17.1               185        3400
 5 Adelie  Torgersen           34.1          18.1               193        3475
 6 Adelie  Torgersen           34.4          18.4               184        3325
 7 Adelie  Biscoe              34.5          18.1               187        2900
 8 Adelie  Torgersen           34.6          21.1               198        4400
 9 Adelie  Torgersen           34.6          17.2               189        3200
10 Adelie  Biscoe              35            17.9               190        3450
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

Setting prop>=1 will return all of the rows in the data frame, sorted by your relevant variable. This means that you can use slice_min() and slice_max() as substitutes for arrange() or order(), if you wanted to do that for some reason!

1.2 Is it helpful?

Yes, I can think of at least three situations where these functions would be helpful:

  • Subsetting your data into quantiles

  • Verifying that a mutation worked correctly by double checking the highest and lowest values

  • Looking quickly for apparent outlier values