library(lubridate)lubridate::ceiling_date()
Function of the Week
1 lubridate::ceiling_date()
In this document, I will introduce the ceiling_date() function and show what it’s for.
1.1 What is it for?
The ceiling_date() function is part of the lubridate package. It is used for rounding a given date-time object up to the nearest boundary of a specified time unit.
The term ceiling means rounding up, and users can specify rounding up to the nearest second, minute, hour, day, week, month, or year.
ceiling_date(x, unit=c("second", "minute", "hour", "day",
"week", "month", "year"))x is a vector of date-time objects.
unit is a string, period object, or date-time object rounded to the nearest boundary of a specific time unit.
Example #1
# format: year/month/day hour/minute/second
x <- ymd_hms("2009-08-03 12:01:59.23") # Monday# rounding
ceiling_date(x, "second")[1] "2009-08-03 12:02:00 UTC"
ceiling_date(x, "minute")[1] "2009-08-03 12:02:00 UTC"
ceiling_date(x, "5 mins")[1] "2009-08-03 12:05:00 UTC"
ceiling_date(x, "hour")[1] "2009-08-03 13:00:00 UTC"
ceiling_date(x, "2 hours")[1] "2009-08-03 14:00:00 UTC"
ceiling_date(x, "day") # Tuesday[1] "2009-08-04 UTC"
ceiling_date(x, "week") # Saturday[1] "2009-08-09 UTC"
ceiling_date(x, "month")[1] "2009-09-01 UTC"
ceiling_date(x, "year")[1] "2010-01-01 UTC"
Reference:
1. https://lubridate.tidyverse.org/reference/round_date.html
2. RDocumentation
Example #2
library(nycflights13)
data(flights)
names(flights) [1] "year" "month" "day" "dep_time"
[5] "sched_dep_time" "dep_delay" "arr_time" "sched_arr_time"
[9] "arr_delay" "carrier" "flight" "tailnum"
[13] "origin" "dest" "air_time" "distance"
[17] "hour" "minute" "time_hour"
head(flights)# A tibble: 6 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 517 515 2 830 819
2 2013 1 1 533 529 4 850 830
3 2013 1 1 542 540 2 923 850
4 2013 1 1 544 545 -1 1004 1022
5 2013 1 1 554 600 -6 812 837
6 2013 1 1 554 558 -4 740 728
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
# tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
# hour <dbl>, minute <dbl>, time_hour <dttm>
# To convert date and time information into a more standard datetime format.
# We write a `function()` with parameters: year, month, day, and time.
# Use `lubridate::make_datetime()` to create a new datetime object (make_datetime_100).
# The time format is in HHMM and splits the time into hours (time %/% 100) and minutes (time %% 100).
make_datetime_100 <- function(year, month, day, time) {
make_datetime(year, month, day, time %/% 100, time %% 100)
}# Rows with missing departure or arrival times are filtered out.
# Use `mutate` to create new columns:
# departure time, arrival time, scheduled departure time, and scheduled arrival time.
# The make_datetime_100 function is applied to, and after select columns:
# origin, destination, columns ending with "delay," and columns ending with "time".
flights_dt <- flights %>%
filter(!is.na(dep_time), !is.na(arr_time)) %>%
mutate(dep_time = make_datetime_100(year, month, day, dep_time),
arr_time = make_datetime_100(year, month, day, arr_time),
sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
sched_arr_time = make_datetime_100(year, month, day, sched_arr_time)) %>%
select(origin, dest, ends_with("delay"), ends_with("time"))# Check
head(flights_dt)# A tibble: 6 × 9
origin dest dep_delay arr_delay dep_time sched_dep_time
<chr> <chr> <dbl> <dbl> <dttm> <dttm>
1 EWR IAH 2 11 2013-01-01 05:17:00 2013-01-01 05:15:00
2 LGA IAH 4 20 2013-01-01 05:33:00 2013-01-01 05:29:00
3 JFK MIA 2 33 2013-01-01 05:42:00 2013-01-01 05:40:00
4 JFK BQN -1 -18 2013-01-01 05:44:00 2013-01-01 05:45:00
5 LGA ATL -6 -25 2013-01-01 05:54:00 2013-01-01 06:00:00
6 EWR ORD -4 12 2013-01-01 05:54:00 2013-01-01 05:58:00
# ℹ 3 more variables: arr_time <dttm>, sched_arr_time <dttm>, air_time <dbl>
skim(flights_dt)| Name | flights_dt |
| Number of rows | 328063 |
| Number of columns | 9 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 3 |
| POSIXct | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| origin | 0 | 1 | 3 | 3 | 0 | 3 | 0 |
| dest | 0 | 1 | 3 | 3 | 0 | 104 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| dep_delay | 0 | 1 | 12.58 | 40.09 | -43 | -5 | -2 | 11 | 1301 | ▇▁▁▁▁ |
| arr_delay | 717 | 1 | 6.90 | 44.63 | -86 | -17 | -5 | 14 | 1272 | ▇▁▁▁▁ |
| air_time | 717 | 1 | 150.69 | 93.69 | 20 | 82 | 129 | 192 | 695 | ▇▂▂▁▁ |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| dep_time | 0 | 1 | 2013-01-01 05:17:00 | 2013-12-31 23:56:00 | 2013-07-04 09:12:00 | 211509 |
| sched_dep_time | 0 | 1 | 2013-01-01 05:15:00 | 2013-12-31 23:59:00 | 2013-07-04 09:15:00 | 125557 |
| arr_time | 0 | 1 | 2013-01-01 00:03:00 | 2014-01-01 00:00:00 | 2013-07-04 11:06:00 | 220332 |
| sched_arr_time | 0 | 1 | 2013-01-01 00:05:00 | 2013-12-31 23:59:00 | 2013-07-04 11:20:00 | 204384 |
# Plot: Departure time
ggplot(flights_dt, aes(x = dep_time)) +
geom_histogram(binwidth = 3600, color = "purple", alpha = 0.7) +
labs(title = "Departure Time Distribution",
x = "Departure Time",
y = "Frequency")# Plot: Departure for each week
flights_dt %>%
count(week = ceiling_date(dep_time, "week")) %>%
ggplot(aes(week, n)) +
geom_line(color = "purple") +
theme_minimal() +
labs(title = "Flight Departure per Week",
x = "Week",
y = "Count",
color = "Line Color")# Plot: Departure for each month
flights_dt %>%
count(month = ceiling_date(dep_time, "month")) %>%
ggplot(aes(month, n)) +
geom_line(color = "purple") +
theme_minimal() +
labs(title = "Flight Departure per Month",
x = "Month",
y = "Count",
color = "Line Color")Instead of plotting the original departure time, we can round up to a nearby unit of time using ceiling_date(), and allows us to plot the number of flights per week and per month.
Reference:
1. R for Data Science. https://r4ds.had.co.nz/dates-and-times.html
2. How to Write Fuctions in R
1.2 Is it helpful?
Yes, it is useful in representing time in plots and can offer insights into patterns and/or trends over different time intervals. This can be especially true for large datasets where ceiling_date() can be used to simplify and group dates to provide a more concise and interpretable representation of trends. I don’t use this everyday, but I do think it is pretty neat!