What Is The drop_na() In R And How To Use It

The tidyr package provides the drop_na() function, which returns a new dataset containing only “full” rows (no rows containing missing values). In this article, we will learn about the syntax and usage of drop_na() in R.

What is the drop_na() in R

The drop_na() function drops rows that contain missing values in the specified columns.

Syntax:

drop_na(data frame, ...)

Parameters:

data frame: a data frame.

…: column to drop missing values. The drop_na() function will use all columns if this argument is omitted.

How to use the drop_na() function in R

We have a dataframe containing data about several students’ test scores. However, some students did not take the test in some subjects.

To use the drop_na() function, you must first install and load the tidyr package:

library('tidyr')

The following example shows us how to use the drop_na() function in R to delete rows containing missing values.

Example:

# Create a data frame
test_scores <-
    data.frame(
        Name = c(
            "Carlos",
            "Patrick",
            "Evans",
            "Tucker",
            "Paul",
            "Nicholas",
            "Adam",
            "Stuart",
            "Murphy",
            "Eleanor"
        ),
        Maths = c(65, NA, 71, 88, 66, 54, NA, 49, 92, NA),
        Biological = c(NA, 44, 65, NA, 77, 65, 88, 58, 48, 72),
        Physics = c(93, 47, 55, 42, 49, 53, 71, 51, NA, 82),
        English = c(74, 66, 64, 70, 82, 44, 80, NA, 68, NA)
    )

cat("Drop all rows containing missing values\n")
drop_na(test_scores)

# Drop all rows containing missing values in a specific column
cat("\nDrop all rows containing missing values in the 'Maths' column\n")
drop_na(test_scores, Maths)

# Drop all rows containing missing values in specific columns
cat("\nDrop all rows containing missing values in the 'Maths' and 'Biological' columns\n")
drop_na(test_scores, Maths, Biological)

cat("\nExcept for the 'Physics' column\n")
drop_na(test_scores, -Physics)

Output:

Drop all rows containing missing values
      Name Maths Biological Physics English
1    Evans    71         65      55      64
2     Paul    66         77      49      82
3 Nicholas    54         65      53      44

Drop all rows containing missing values in the 'Maths' column
      Name Maths Biological Physics English
1   Carlos    65         NA      93      74
2    Evans    71         65      55      64
3   Tucker    88         NA      42      70
4     Paul    66         77      49      82
5 Nicholas    54         65      53      44
6   Stuart    49         58      51      NA
7   Murphy    92         48      NA      68

Drop all rows containing missing values in the 'Maths' and 'Biological' columns
      Name Maths Biological Physics English
1    Evans    71         65      55      64
2     Paul    66         77      49      82
3 Nicholas    54         65      53      44
4   Stuart    49         58      51      NA
5   Murphy    92         48      NA      68

Except for the 'Math' column
      Name Maths Biological Physics English
1    Evans    71         65      55      64
2     Paul    66         77      49      82
3 Nicholas    54         65      53      44
4   Murphy    92         48      NA      68

By changing the ... argument with operators to select variables by their names, you can easily change the output of your program.

Summary

This article has shared the syntax and usage of drop_na() in R. You can change the output according to the program’s requirements by selecting specific columns to drop rows containing missing values. Thank you for reading.

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *