How to use tidyr extract in R

extract in r

The extract() function in R, provided by the tidyr package, is used to extract data from a character column into multiple columns. This article will share about the extract in R and how to use it. Let’s go.

What is the extract in R?

The extract() function divides the data in the input column into groups according to a regular expression. Each group is placed in a new column. If the groups given by the regular expression do not match the input or the input is NA, then the extract() function returns NA.


extract(data, col, into, regex, remove, convert,...)


  • data: a data frame.
  • col: column name or position.
  • into: names of new columns.
  • regex: a regular expression.
  • remove: remove the input column. The default is TRUE.
  • convert: run the type.convert() function with = TRUE on new columns. The default is FALSE.

How to use the extract() function?

A simple example of using extract in R is to separate first and last names from the full name.

In the following example, we have a dataframe containing information about several users’ full names. We will use the extract() function to split the column containing the full name data into two columns representing the first and last names.



name <- data.frame(
    fullName = c(
        "Glenn Davis",
        "Zachary Baker",
        "John Mendoza",
        "Ruth Lopez",
        "Shaun Sanchez",
        "Walter Gaines",
        "Cassandra Cox",
        "Andre Herman",
        "Stacy Foster",
        "Renee Burns"

# Extract column 'fullName' into two columns 'First Name' and 'Last Name'
extractName <- extract(
    into = c("First Name", "Last Name"),
    regex = "([[:alpha:]]+)[[:space:]]([[:alpha:]]+)"



   First Name Last Name
1       Glenn     Davis
2     Zachary     Baker
3        John   Mendoza
4        Ruth     Lopez
5       Shaun   Sanchez
6      Walter    Gaines
7   Cassandra       Cox
8       Andre    Herman
9       Stacy    Foster
10      Renee     Burns

We use the character class ‘[:alpha:]’ (uppercase and lowercase characters) in the regular expression. The regular expression is of the form “([[:alpha:]]+)[[:space:]]([[:alpha:]]+)” (the extract() function splits the original variable into two new variables based on the space character). Click here to learn more about using regular expressions in R.

Next, we have an example of separating the day, month, and year from a date string.



dateString <- data.frame(
    date = c(
        "12 02 2022",

# Extract column "date" into three columns "month", "day" and "year"
extractDate <- extract(
    into = c("month", "day", "year"),
    regex = "([[:digit:]]+)[[:print:]]([[:digit:]]+)[[:print:]]([[:digit:]]+)"



  month day year
1    12  01 2022
2    12  02 2022
3    12  03 2022
4    12  04 2022

The regular expression in this example splits digit strings ([:digit:]⁠) by delimiters that are printable characters (including special characters and spaces).


The article shares how to use the extract in R to extract data from a character column into multiple columns. You need to note that the number of newly set column names must equal the number of columns returned by the regular expression. Thanks for reading.

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *