lubridate::ceiling_date()

Function of the Week

Used for rounding a given date-time object up to the nearest boundary of a specified time unit
Author

Vida Echaluse

Published

February 7, 2024

1 lubridate::ceiling_date()

In this document, I will introduce the ceiling_date() function and show what it’s for.

library(lubridate)

1.1 What is it for?

The ceiling_date() function is part of the lubridate package. It is used for rounding a given date-time object up to the nearest boundary of a specified time unit.

The term ceiling means rounding up, and users can specify rounding up to the nearest second, minute, hour, day, week, month, or year.

ceiling_date(x, unit=c("second", "minute", "hour", "day",
    "week", "month", "year"))
  • x is a vector of date-time objects.

  • unit is a string, period object, or date-time object rounded to the nearest boundary of a specific time unit.


Example #1

# format: year/month/day hour/minute/second
x <- ymd_hms("2009-08-03 12:01:59.23") # Monday
# rounding
ceiling_date(x, "second")
[1] "2009-08-03 12:02:00 UTC"
ceiling_date(x, "minute")
[1] "2009-08-03 12:02:00 UTC"
ceiling_date(x, "5 mins")
[1] "2009-08-03 12:05:00 UTC"
ceiling_date(x, "hour")
[1] "2009-08-03 13:00:00 UTC"
ceiling_date(x, "2 hours")
[1] "2009-08-03 14:00:00 UTC"
ceiling_date(x, "day") # Tuesday
[1] "2009-08-04 UTC"
ceiling_date(x, "week") # Saturday
[1] "2009-08-09 UTC"
ceiling_date(x, "month")
[1] "2009-09-01 UTC"
ceiling_date(x, "year")
[1] "2010-01-01 UTC"

Reference:
1. https://lubridate.tidyverse.org/reference/round_date.html
2. RDocumentation


Example #2

library(nycflights13)
data(flights)
names(flights)
 [1] "year"           "month"          "day"            "dep_time"      
 [5] "sched_dep_time" "dep_delay"      "arr_time"       "sched_arr_time"
 [9] "arr_delay"      "carrier"        "flight"         "tailnum"       
[13] "origin"         "dest"           "air_time"       "distance"      
[17] "hour"           "minute"         "time_hour"     
head(flights)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
6  2013     1     1      554            558        -4      740            728
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>
# To convert date and time information into a more standard datetime format.
# We write a `function()` with parameters: year, month, day, and time.
# Use `lubridate::make_datetime()` to create a new datetime object (make_datetime_100). 
# The time format is in HHMM and splits the time into hours (time %/% 100) and minutes (time %% 100).

make_datetime_100 <- function(year, month, day, time) {
  make_datetime(year, month, day, time %/% 100, time %% 100)
}
# Rows with missing departure or arrival times are filtered out. 
# Use `mutate` to create new columns:
# departure time, arrival time, scheduled departure time, and scheduled arrival time. 
# The make_datetime_100 function is applied to, and after select columns:
# origin, destination, columns ending with "delay," and columns ending with "time".

flights_dt <- flights %>% 
  filter(!is.na(dep_time), !is.na(arr_time)) %>% 
  mutate(dep_time = make_datetime_100(year, month, day, dep_time),
         arr_time = make_datetime_100(year, month, day, arr_time),
         sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
         sched_arr_time = make_datetime_100(year, month, day, sched_arr_time)) %>%
  select(origin, dest, ends_with("delay"), ends_with("time"))
# Check
head(flights_dt)
# A tibble: 6 × 9
  origin dest  dep_delay arr_delay dep_time            sched_dep_time     
  <chr>  <chr>     <dbl>     <dbl> <dttm>              <dttm>             
1 EWR    IAH           2        11 2013-01-01 05:17:00 2013-01-01 05:15:00
2 LGA    IAH           4        20 2013-01-01 05:33:00 2013-01-01 05:29:00
3 JFK    MIA           2        33 2013-01-01 05:42:00 2013-01-01 05:40:00
4 JFK    BQN          -1       -18 2013-01-01 05:44:00 2013-01-01 05:45:00
5 LGA    ATL          -6       -25 2013-01-01 05:54:00 2013-01-01 06:00:00
6 EWR    ORD          -4        12 2013-01-01 05:54:00 2013-01-01 05:58:00
# ℹ 3 more variables: arr_time <dttm>, sched_arr_time <dttm>, air_time <dbl>
skim(flights_dt)
Data summary
Name flights_dt
Number of rows 328063
Number of columns 9
_______________________
Column type frequency:
character 2
numeric 3
POSIXct 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
origin 0 1 3 3 0 3 0
dest 0 1 3 3 0 104 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
dep_delay 0 1 12.58 40.09 -43 -5 -2 11 1301 ▇▁▁▁▁
arr_delay 717 1 6.90 44.63 -86 -17 -5 14 1272 ▇▁▁▁▁
air_time 717 1 150.69 93.69 20 82 129 192 695 ▇▂▂▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
dep_time 0 1 2013-01-01 05:17:00 2013-12-31 23:56:00 2013-07-04 09:12:00 211509
sched_dep_time 0 1 2013-01-01 05:15:00 2013-12-31 23:59:00 2013-07-04 09:15:00 125557
arr_time 0 1 2013-01-01 00:03:00 2014-01-01 00:00:00 2013-07-04 11:06:00 220332
sched_arr_time 0 1 2013-01-01 00:05:00 2013-12-31 23:59:00 2013-07-04 11:20:00 204384
# Plot: Departure time
ggplot(flights_dt, aes(x = dep_time)) +
  geom_histogram(binwidth = 3600, color = "purple", alpha = 0.7) +
  labs(title = "Departure Time Distribution",
       x = "Departure Time",
       y = "Frequency")

# Plot: Departure for each week
flights_dt %>% 
  count(week = ceiling_date(dep_time, "week")) %>% 
  ggplot(aes(week, n)) +
    geom_line(color = "purple") +
    theme_minimal() +
    labs(title = "Flight Departure per Week",
         x = "Week",
         y = "Count",
         color = "Line Color")

# Plot: Departure for each month
flights_dt %>% 
  count(month = ceiling_date(dep_time, "month")) %>% 
  ggplot(aes(month, n)) +
    geom_line(color = "purple") +
    theme_minimal() +
    labs(title = "Flight Departure per Month",
         x = "Month",
         y = "Count",
         color = "Line Color")

Instead of plotting the original departure time, we can round up to a nearby unit of time using ceiling_date(), and allows us to plot the number of flights per week and per month.

Reference:
1. R for Data Science. https://r4ds.had.co.nz/dates-and-times.html
2. How to Write Fuctions in R

1.2 Is it helpful?

Yes, it is useful in representing time in plots and can offer insights into patterns and/or trends over different time intervals. This can be especially true for large datasets where ceiling_date() can be used to simplify and group dates to provide a more concise and interpretable representation of trends. I don’t use this everyday, but I do think it is pretty neat!