Missing values are very popular in data science. In this article, we will give you examples of missing values and a few methods to handle missing values when in R.
Missing values in R
In R, missing values are displayed by the symbol NA which means not available. You can use the is.na() function to identify missing values in a vector or data frame. For example:
Code:
x <- c(1, 2, NA, 4, 5) cat(is.na(x))
This piece of code will return a logical vector TRUE for the missing value and FALSE for the other values:
Result:
FALSE FALSE TRUE FALSE FALSE
Moreover, when working with the amount of data, you can use the sum() function to count the number of missing values.
Code:
x <- c(1, 2, NA, 4, NA) cat("Number of missing values are: ") cat(sum(is.na(x)))
Result:
Number of missing values are: 2
Handling missing values in R
You already know the missing values. Now, we will show you a few functions to remove missing values.
Use the complete.cases() function
We have another article to introduce the complete.cases()
function and show how to use the function to handle missing values. Please take a look at our previous article to read more detail. In this article, we only remind you that the complete.cases()
function can apply to vectors, matrices, columns, and whole dataframe. Look at the following example.
Code:
# Declare vectors having missing values v1 <- c(5, 17, NA, 12, 32, NA) v2 <- c(12, 23, 19, NA, NA, 0) v3 <- c(4, 32, 11, NA, 21, NA) # Create a dataframe from vectors df <- data.frame(v1, v2, v3) # Drop rows having missing values new_df <- df[complete.cases(df), ] new_column <- df$v1[complete.cases(df$v1)] cat("The dataframe after eliminating missing values\n") print(new_df) cat("The column after eliminating missing values\n") print(new_column)
Result:
The dataframe after eliminating missing values
v1 v2 v3
1 5 12 4
2 17 23 32
The column after eliminating missing values
[1] 5 17 12 32
Use the na.omit() function
Similar to the complete.cases()
function, the na.omit()
function also removes rows having missing values. However, the function applies to the whole dataframe instead of rows.
Syntax:
na.omit(df)
Argument:
df: The dataframe with missing values
Code:
# Declare vectors having missing values v1 <- c(5, 17, NA, 12, 32, NA) v2 <- c(12, 23, 19, NA, NA, 0) v3 <- c(4, 32, 11, NA, 21, NA) # Create a dataframe from vectors df <- data.frame(v1, v2, v3) # Drop rows having missing values df <- na.omit(df) cat("The dataframe after eliminating missing values\n") print(df)
Result:
The dataframe after eliminating missing values
v1 v2 v3
1 5 12 4
2 17 23 32
Summary
In summary, missing values are assigned NA (not available), which means there are no values in the positions. You can remove rows with missing values by using the complete.cases()
and na.omit()
functions; however, if you want to remove missing values in special columns, only the complete.cases()
function is active.
My name is Robert Collier. I graduated in IT at HUST university. My interest is learning programming languages; my strengths are Python, C, C++, and Machine Learning/Deep Learning/NLP. I will share all the knowledge I have through my articles. Hope you like them.
Name of the university: HUST
Major: IT
Programming Languages: Python, C, C++, Machine Learning/Deep Learning/NLP