R code
library(tidyverse)Regular expressions are sequences of characters that define search patterns. In R, they are commonly used for:
R provides several base functions that support regular expressions:
grep() / grepl()grep() returns indices of matches.grepl() returns a logical vector indicating matches.sub() / gsub()sub() replaces the first match.gsub() replaces all matches.regexpr() / gregexpr()regmatches()regexpr() or gregexpr().Here are some commonly used regex symbols:
| Symbol | Meaning | Example |
|---|---|---|
. |
Any character except newline | "a.b" matches “acb”, “a1b” |
^ |
Start of string | "^cat" matches “catfish” |
$ |
End of string | "cat$" matches “bobcat” |
* |
0 or more repetitions | "ca*t" matches “ct”, “cat”, “caaaat” |
+ |
1 or more repetitions | "ca+t" matches “cat”, “caaaat” |
? |
0 or 1 repetition | "ca?t" matches “ct”, “cat” |
[] |
Character class | "[cd]og" matches “dog”, “cog” |
| |
OR | "cat|dog" matches either |
() |
Grouping | "(cat|dog)s?" matches “cat”, “cats”, “dog”, “dogs” |
\\ |
Escape special characters | "\\." matches a literal dot |
[1] "apple" "banana" "date"
[1] "apple"
[1] "banana"
stringrThe base R regular expressions are great to use if you already use regular expressions in another programming language. Starting out the stringr package, part of the tidyverse, provides a cohesive and consistent set of functions for string manipulation. It simplifies working with regular expressions by offering:
str_*)stringr rather than base R regex functions?dplyr, purrr, and other packages.grep() and sub().stringr Functions for RegexHere are some of the most useful functions when working with regular expressions:
str_detect()Checks if a pattern exists in a string.
str_replace() / str_replace_all()Replaces the first or all occurrences of a pattern.
str_extract() / str_extract_all()Extracts the first or all matches of a pattern.
str_match() / str_match_all()Extracts matched groups using parentheses.
str_split()Splits strings based on a pattern.
All stringr functions accept regular expressions by default. You can use:
^, $, ., *, +, ?, [], (), | — standard regex symbols\\d, \\s, \\w for digits, whitespace, and word charactersstringistringr is built on top of the stringi package. stringr is useful when you’re learning because it exposes a minimal set of functions, which have been carefully picked to handle the most common string manipulation functions. stringi, on the other hand, is designed to be comprehensive. It contains almost every function you might ever need: stringi has 250 functions to stringr’s 49.
R for Data Science Chapter 15.
Today we will walk through Chapter 15 Regular expressions in R for Data Science. As we did last week, by putting the examples and exercises in our own Quarto Markdown file, we can create own personal path through the Chapter.
After you Render the qmd file to an html file, export the file to your computer and upload it to Canvas.