A data engineer spends almost time on data processing, and dealing with missing data is one of them. There are two main methods to handle missing data: eliminating them or creating new values based on existing values. This article will show you how to eliminate missing data by using the complete.cases() function in R.
What is the complete.cases() function in R?
complete.cases() function eliminates missing values in a vector, matrix, or dataframe and returns the one with no missing data. When applying the function to a matrix or dataframe, all rows having missing values will be dropped.
new_object = object[complete.cases(object)]
- object: Vector, matrix, or dataframe that has missing values
- new_object: New vector, matrix, or dataframe that has no missing data
How to use the complete.cases() function?
Now, we will show you a few examples to use the
complete.cases() function with vectors, matrices, and dataframe, respectively.
Eliminate missing data in a vector
Starting with a vector – unit to create a matrix or dataframe. We will declare an integer vector with a few NA values, then apply the function to see the result.
# Create a raw vector with a few NA values rawVect <- c(1, 4, 3, 5, NA, 6, 9, NA, 0) # Eliminate Na values cleanVect <- rawVect[complete.cases(rawVect)] cat("The vector after handling missing values is:\n") print(cleanVect)
The vector after handling missing values is:  1 4 3 5 6 9 0
Eliminate missing data in a matrix
When applying the function to remove missing values in a matrix, all the rows that have missing values are also removed. As a result, the function returns a vector representing values in rows with no missing values.
# Create a vector with a few NA values vect <- c(1, 2, NA, NA, 5, 6, 7, 8, NA) # Create a 3x3 matrix from the vector m <- matrix(vect, nrow = 3, ncol = 3) cat("The original matrix is:\n") print(m) newVect <- m[complete.cases(m)] cat("The result after eliminating missing values in the matrix is:\n") print(newVect)
The original matrix is: [,1] [,2] [,3] [1,] 1 NA 7 [2,] 2 5 8 [3,] NA 6 NA The result after eliminating missing values in the matrix is:  2 5 8
The first and third rows are removed because they have missing values.
Eliminate missing data in a dataframe
Similar to matrices, when applying the function to a dataframe, all rows that have missing values will also be removed.
# Declare vectors having missing values v1 <- c(5, 17, NA, 12, 32, NA) v2 <- c(12, 23, 19, NA, NA, 0) v3 <- c(4, 32, 11, NA, 21, NA) # Create a dataframe from vectors df <- data.frame(v1, v2, v3) # -> df # v1 v2 v3 # 1 5 12 4 # 2 17 23 32 # 3 NA 19 11 # 4 12 NA NA # 5 32 NA 21 # 6 NA 0 NA # Drop rows having missing values df <- df[complete.cases(df), ] cat("The dataframe after eliminateing missing values\n") print(df)
The dataframe after eliminating missing values v1 v2 v3 1 5 12 4 2 17 23 32
In summary, the complete.cases() function removes missing values in a hard way. The function removes not only missing values but also remove rows containing missing values in matrices or data frames.
Maybe you are interested:
- ncol in R: Count the number of the columns in the R object
- nrow in R: The number of rows in the R object
- The dnorm() in R
My name is Robert Collier. I graduated in IT at HUST university. My interest is learning programming languages; my strengths are Python, C, C++, and Machine Learning/Deep Learning/NLP. I will share all the knowledge I have through my articles. Hope you like them.
Name of the university: HUST
Programming Languages: Python, C, C++, Machine Learning/Deep Learning/NLP