How to use dplyr distinct in R?

distinct in r

There are many ways to select unique or distinct rows from a dataframe, one of which is using dplyr distinct in R. This article will share the syntax, parameters, and usage of the distinct() function in R.

What is the dplyr distinct in R?

The dplyr package provides the distinct() function. It selects distinct or unique rows from the dataframe.

Syntax:

distinct(data frame, ..., .keep_all)

Parameters:

  • data frame: a data frame.
  • …: Optional rows to use when determining distinct.
  • .keep_all: Default is FALSE. If TRUE, keep all variables in the dataframe.

How to use the distinct() function in R?

You can use the distinct() function to get distinct rows of all columns or selected columns in the dataframe. You can also set the value of the .keep_all parameter to TRUE to keep all variables in the dataframe.

We built a simple data frame for the height and weight of some students. In it, some of the height and weight values are repeated.

Example:

# Create a data frame
student <- data.frame(
    name = c(
        "Avram",
        "Rebecca",
        "Hansen",
        "Alana",
        "Kelly",
        "Dudley",
        "Brenna",
        "Tyrone",
        "Oliver",
        "Laura"
    ),
    height = c(164, 172, 173, 164, 172, 163, 175, 156, 173, 164),
    weight = c(55, 66, 67, 48, 79, 57, 66, 55, 71, 48)
)
print(student)

Output:

      name height weight
1    Avram    164     55
2  Rebecca    172     66
3   Hansen    173     67
4    Alana    164     48
5    Kelly    172     79
6   Dudley    163     57
7   Brenna    175     66
8   Tyrone    156     55
9   Oliver    173     71
10   Laura    164     48

Get distinct rows of selected columns

You can do distinct on the selected column by setting the second parameter ‘…’ to the variable name that you want to use to perform distinctly.

Example:

student <- data.frame(
    name = c(
        "Avram",
        "Rebecca",
        "Hansen",
        "Alana",
        "Kelly",
        "Dudley",
        "Brenna",
        "Tyrone",
        "Oliver",
        "Laura"
    ),
    height = c(164, 172, 173, 164, 172, 163, 175, 156, 173, 164),
    weight = c(55, 66, 67, 48, 79, 57, 66, 55, 71, 48)
)

library(dplyr)

# Distinct with the 'height' column
student1 <- distinct(student, height)
cat("Distinct with the 'height' column\n")
print(student1)

# Distinct with the 'weight' column
student2 <- distinct(student, weight)
cat("\nDistinct with the 'weight' column\n")
print(student2)

Output:

Distinct with the 'height' column
  height
1    164
2    172
3    173
4    163
5    175
6    156

Distinct with the 'weight' column
  weight
1     55
2     66
3     67
4     48
5     79
6     57
7     71

Keep all variables in the data frame

The ‘.keep_all‘ parameter is set to FALSE by default. You can choose to keep all other variables by setting ‘.keep_all’ to TRUE.

Example:

student <- data.frame(
    name = c(
        "Avram",
        "Rebecca",
        "Hansen",
        "Alana",
        "Kelly",
        "Dudley",
        "Brenna",
        "Tyrone",
        "Oliver",
        "Laura"
    ),
    height = c(164, 172, 173, 164, 172, 163, 175, 156, 173, 164),
    weight = c(55, 66, 67, 48, 79, 57, 66, 55, 71, 48)
)

library(dplyr)

# Choose to keep all other variables when doing distinct with the 'height' column
student1 <- distinct(student, height, .keep_all = TRUE)
cat("Distinct with the 'height' column\n")
print(student1)

# Choose to keep all other variables when doing distinct with the 'weight' column
student2 <- distinct(student, weight, .keep_all = TRUE)
cat("\nDistinct with the 'weight' column\n")
print(student2)

Output:

Distinct with the 'height' column
     name height weight
1   Avram    164     55
2 Rebecca    172     66
3  Hansen    173     67
4  Dudley    163     57
5  Brenna    175     66
6  Tyrone    156     55

Distinct with the 'weight' column
     name height weight
1   Avram    164     55
2 Rebecca    172     66
3  Hansen    173     67
4   Alana    164     48
5   Kelly    172     79
6  Dudley    163     57
7  Oliver    173     71

Summary

We learned about the distinct() function in R and how to use it on variables. We recommend using the distinct() function in R with the .keep_all parameter set to TRUE for the most intuitive results. Thank you for reading.

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *