Skip to main content

Command Palette

Search for a command to run...

Regular expressions

Updated
4 min read
W

Hello,

I'm passionate about transforming raw data into actionable insights, driven by a lifelong fascination with numbers. As a data analyst, I enjoy uncovering meaningful patterns and collaborating with like-minded individuals.

I'm also a strong advocate for mental health and use data to contribute to this important cause. My background in the medical field enhances my analytical approach, bridging the gap between healthcare and data analysis.

Sequence of (meta) characters.

used for pattern matching or string matching

Uses:

  • Data extraction

  • Cleaning

  • Data analysis

  • Data validation

  • Text mining

  • parsing

[abc]a,b,c
[^abc]any character except a, b, c
[a- z]a to z
[A - Z]A to Z
[a -z A- Z]a to z, A to Z
[0 - 9]0 to 9
[ ]?occurs 0 or 1 time
[ ]+Occurs 1 or more times
[ ] *occurs 0 or more times
[ ]{n}occurs n times
[ ]{n, }occurs n or more times
[ ]{y, z}occurs at least y times but less than z times
[:alnum:]any alphanumeric character
[:digit:]any numeric digit
[:alpha:]any letter (upper or lowercase)
[:upper:]any uppercase letter
[:lower:]any lowercase letter

Regex Metacharacters

\d[0 - 9]
\D[^0 - 9]
\w[a - z A - z 0 -9]
\W[^\w]
"\\s"a single space
^Anchors the pattern to the beginning of a string.
$Anchors the pattern to the end of a string.
*Any character that is matched zero or more times

grepl()

Searches for a pattern within a character vector or list of character strings.

Stands for "global regular expression pattern matching with logical return."

Returns logical vector indicating whether a match was found for the pattern.

Syntax:

grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

pattern regular expression pattern you want to search

x the character vector or string which you want to search for the pattern

ignore.case ` optional legacy argument that specifies whether the pattern matching should be case_insensitive (TRUE)

  • The \1 in the replacement argument of sub() gets set to the string that is captured by the regular expression [0-9]+.

Example

grep()

Returns a numeric vector of indices (positions) where the pattern is found in the input vector. It returns the position of the elements that match the pattern.

Syntax

grep(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

sub()

The function is used for pattern substitution within character strings.

It replaces the first occurrence of a specified pattern (regular expression) in a character vector with a replacement string and returns the modified character vector.

syntax

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE)

pattern regular expression pattern you want to search

replacement This is the string that will replace the first occurrence of the pattern in each element ox x

x the character vector or string which you want to search for the pattern

ignore.case ` optional legacy argument that specifies whether the pattern matching should be case_insensitive (TRUE)

  • perl: An optional logical argument that indicates whether the pattern should be treated as a Perl-compatible regular expression (TRUE) or a basic regular expression (FALSE, the default).

  • fixed: An optional logical argument that specifies whether pattern should be treated as a fixed string (TRUE) or as a regular expression (FALSE, the default).

Note: If you want to replace all occurrences, you can use the gsub() function.

regexpr()

is used to find the starting position of a specified pattern (regular expression) within a character vector or a list of character strings.

If no match is found, it returns -1.

Syntax

regexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE)
  • pattern: This is the regular expression pattern you want to search for within the character vector text.

  • text: This is the character vector or list of character strings in which you want to find the pattern.

  • ignore.case: An optional logical argument that specifies whether the pattern matching should be case-insensitive (TRUE) or case-sensitive (FALSE, the default).

  • perl: An optional logical argument that indicates whether the pattern should be treated as a Perl-compatible regular expression (TRUE) or a basic regular expression (FALSE, the default).

  • fixed: An optional logical argument that specifies whether pattern should be treated as a fixed string (TRUE) or as a regular expression (FALSE, the default).

  • regexpr() is useful when you specifically need to know the starting position of the first occurrence of a pattern within each string in text.

    Note: If you want to find the positions of all occurrences of the pattern within each element, you can use the grep() or gregexpr() function, which returns positions for multiple matches.

    Example

Further Reading

https://bookdown.org/rdpeng/RProgDA/text-processing-and-regular-expressions.html

More from this blog