The extract() function in R, provided by the tidyr package, is used to extract data from a character column into multiple columns. This article will share about the extract in R and how to use it. Let’s go.
What is the extract in R?
The extract() function divides the data in the input column into groups according to a regular expression. Each group is placed in a new column. If the groups given by the regular expression do not match the input or the input is NA, then the extract() function returns NA.
extract(data, col, into, regex, remove, convert,...)
- data: a data frame.
- col: column name or position.
- into: names of new columns.
- regex: a regular expression.
- remove: remove the input column. The default is TRUE.
- convert: run the type.convert() function with as.is = TRUE on new columns. The default is FALSE.
How to use the extract() function?
A simple example of using extract in R is to separate first and last names from the full name.
In the following example, we have a dataframe containing information about several users’ full names. We will use the extract() function to split the column containing the full name data into two columns representing the first and last names.
library(tidyr) name <- data.frame( fullName = c( "Glenn Davis", "Zachary Baker", "John Mendoza", "Ruth Lopez", "Shaun Sanchez", "Walter Gaines", "Cassandra Cox", "Andre Herman", "Stacy Foster", "Renee Burns" ) ) # Extract column 'fullName' into two columns 'First Name' and 'Last Name' extractName <- extract( name, fullName, into = c("First Name", "Last Name"), regex = "([[:alpha:]]+)[[:space:]]([[:alpha:]]+)" ) print(extractName)
First Name Last Name 1 Glenn Davis 2 Zachary Baker 3 John Mendoza 4 Ruth Lopez 5 Shaun Sanchez 6 Walter Gaines 7 Cassandra Cox 8 Andre Herman 9 Stacy Foster 10 Renee Burns
We use the character class ‘[:alpha:]’ (uppercase and lowercase characters) in the regular expression. The regular expression is of the form “([[:alpha:]]+)[[:space:]]([[:alpha:]]+)” (the extract() function splits the original variable into two new variables based on the space character). Click here to learn more about using regular expressions in R.
Next, we have an example of separating the day, month, and year from a date string.
library(tidyr) dateString <- data.frame( date = c( "12/01/2022", "12 02 2022", "12-03-2022", "12.04.2022" ) ) # Extract column "date" into three columns "month", "day" and "year" extractDate <- extract( dateString, 1, into = c("month", "day", "year"), regex = "([[:digit:]]+)[[:print:]]([[:digit:]]+)[[:print:]]([[:digit:]]+)" ) print(extractDate)
month day year 1 12 01 2022 2 12 02 2022 3 12 03 2022 4 12 04 2022
The regular expression in this example splits digit strings ([:digit:]) by delimiters that are printable characters (including special characters and spaces).
The article shares how to use the extract in R to extract data from a character column into multiple columns. You need to note that the number of newly set column names must equal the number of columns returned by the regular expression. Thanks for reading.
Maybe you are interested:
- median Function In R: Calculating Simple Median
- How To Use The ts Function In R
- library Function In R: Load A Package
Hello, my name’s Bruce Warren. You can call me Bruce. I’m interested in programming languages, so I am here to share my knowledge of programming languages with you, especially knowledge of C, C++, Java, JS, PHP.
Name of the university: KMA
Programming Languages: C, C++, Java, JS, PHP