stringr::str_to_camel()|str_to_snake()|str_to_kebab()

BSTA 526 Functions of the Week

changes chosen string via capitalization, ’_’ and ‘-’
Author

Meara Arboren

Published

February 5, 2026

1 The Data:

I chose to find a dataset to use for this project. I found this dataset on the Royal Veterinary College open access data page.

This data was collected to an answer the research question: “Does antimicrobial prescription compared to no antimicrobial prescription and (separately) gastrointestinal nutraceutical prescription compared to no gastrointestinal nutraceutical prescription for acute diarrhoea in dogs cause a difference in clinical resolution and time to treatment escalation?”.

This study was done as part of a larger PHD thesis, more information can be found at the authors data site: “https://github.com/cpegram92/causal-inference-phd”.

Diarrhea.trials <-readr::read_csv(here::here("function_week", "data", "Diarrhoea_Target_Trial_Arboren.csv"))
Rows: 894 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (8): BirthDate, SexNeuterStatus, VetCompassBreed, DateofFirstPresentati...
dbl (13): PatientID, DataSilo, InsuranceStatus, Bodyweight, Vomiting, Reduce...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(Diarrhea.trials) #quick visual of variable names
 [1] "PatientID"                          "DataSilo"                          
 [3] "BirthDate"                          "InsuranceStatus"                   
 [5] "SexNeuterStatus"                    "VetCompassBreed"                   
 [7] "Bodyweight"                         "DateofFirstPresentationofDiarrhoea"
 [9] "Duration"                           "Vomiting"                          
[11] "ReducedAppetite"                    "Pyrexia"                           
[13] "Hematochezia"                       "AntibioticTreatment"               
[15] "GINutraceutical"                    "DietaryManagement"                 
[17] "Antiparasitic"                      "GastrointestinalAgent"             
[19] "Resolution"                         "ComorbidityatDiagnosis"            
[21] "TreatmentEscalation"               

2 The Functions

These three functions are found in the stringr package and are string manipulators that change how values or variable names with mutliple words are visualized.

  1. str_to_snake()
  2. str_to_camel()
  3. str_to_kebab()

2.1 str_to_snake()

This function converts chosen a string by inserting a underscore ’_’ between words.

In this example I show how this function can change the layout of the character values in an individual variable’s column.

D.Snake.1 <- 
  str_to_snake(Diarrhea.trials$VetCompassBreed)
head(D.Snake.1,10)
 [1] "crossbreed"          "sprocker"            "border_terrier"     
 [4] "border_collie"       "labrador_retriever"  "miniature_dachshund"
 [7] "dogue_de_bordeaux"   "maltese"             "pug"                
[10] "german_shepherd_dog"

In this example I demonstrated how this function can be used on the dataset to change the layout of variable names.

D.Snake.2 <-  Diarrhea.trials |> 
  rename_with( str_to_snake) 
D.Snake.2 |> 
select(1:10) |> 
names()
 [1] "patient_id"                           
 [2] "data_silo"                            
 [3] "birth_date"                           
 [4] "insurance_status"                     
 [5] "sex_neuter_status"                    
 [6] "vet_compass_breed"                    
 [7] "bodyweight"                           
 [8] "dateof_first_presentationof_diarrhoea"
 [9] "duration"                             
[10] "vomiting"                             

2.2 str_to_camel()

This function converts chosen string by capitalizing the first letter of each word within the string. The default setting will not capitalize the first letter(first_upper = FALSE).

To capitalize the first letter of every word use: first_upper = TRUE.

This function is also called “camel case”.

In this example I changed all the variable names to the default camel case, using the recently made dataset, D.Snake.2.

#default camel case
D.Camel.1 <- D.Snake.2 |> 
  select(1:10) |> 
  rename_with(
str_to_camel
)

names(D.Camel.1)
 [1] "patientId"                          "dataSilo"                          
 [3] "birthDate"                          "insuranceStatus"                   
 [5] "sexNeuterStatus"                    "vetCompassBreed"                   
 [7] "bodyweight"                         "dateofFirstPresentationofDiarrhoea"
 [9] "duration"                           "vomiting"                          

In this example I showed how this function can be used to change a dataset’s variable names to the camel case using the first_upper = TRUE on the dataset D.Snake.2.

# first letter of every word capitalized 
D.Camel.2 <- D.Camel.1 |> 
  select(1:10) |> 
  rename_with(
str_to_camel, first_upper = TRUE
)
names(D.Camel.2)
 [1] "PatientId"                          "DataSilo"                          
 [3] "BirthDate"                          "InsuranceStatus"                   
 [5] "SexNeuterStatus"                    "VetCompassBreed"                   
 [7] "Bodyweight"                         "DateofFirstPresentationofDiarrhoea"
 [9] "Duration"                           "Vomiting"                          

2.3 str_to_kebab()

This function converts chosen string by placing a dash or ‘-’ between each word.

In this example I demonstrated how this function is used in an individual variable’s column to manipulate the layout of the character values.

D.Kebab.1 <- 
  str_to_kebab(Diarrhea.trials$SexNeuterStatus) # manipulating single column's values
head(D.Kebab.1,10)
 [1] "female-entire"   "female-entire"   "female-neutered" "female-neutered"
 [5] "female-entire"   "male-neutered"   "male-entire"     "male-entire"    
 [9] "female-entire"   "female-neutered"

In this example I demonstrated how this function can also be used on the dataset as a whole.

D.Kebab.2 <-  D.Camel.2 |> #using previously created camel case dataset 
  rename_with( str_to_kebab) 

names(D.Kebab.2)
 [1] "patient-id"                           
 [2] "data-silo"                            
 [3] "birth-date"                           
 [4] "insurance-status"                     
 [5] "sex-neuter-status"                    
 [6] "vet-compass-breed"                    
 [7] "bodyweight"                           
 [8] "dateof-first-presentationof-diarrhoea"
 [9] "duration"                             
[10] "vomiting"                             

3 Is it helpful?

These functions can be useful tools in data cleaning as they can easily and quickly change how variable names and character values are shown. Of the three, I liked the str_to_snake the best as this one aligns more with our naming conventions and is the easiest to read quickly.

The dataset used here, utilized str_to_camel without the default setting so it was nice to see one of these options being used in real world studies.