In this tutorial, you will learn how to use the filter() function in R. The function help you manipulate with data frame’s rows and returns samples suitable with conditions.
Let’s move on to discover all about the function.
What is the filter() function in R?
The filter() function executes on a dataframe to find rows (samples) that satisfy the conditions of the expression.
Syntax:
filter(data_frame, expression)
Parameters:
- data_frame: A data frame to apply the filter() function.
- expression: The expressions having conditions to filter satisfied rows.
How to use the filter() function?
Because the filter() function aims to find samples satisfying the condition, the expressions passing to it are also conditional operators. Some handy functions to use with the filter() function are: ==, !=, &, between(), is.na(), etc…
Below are some examples of using the filter() function.
Use the filter() function with ==
In this example, we will use the filter() function to find people who live in Washington State. To filter based on the column State, we pass the column’s name, and the operator == is to specify the value of the target.
Code:
# Create three vectors named Name, Age, and State Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy") Age <- c(22, 31, 17, 41, 53, 26, 27) State <- c("Florida", "Ohio", NA, "Texas", "Washington", "California", "Washington") # Create a data frame named data from the vectors data = data.frame(Name, Age, State) # Import library('dplyr') library('dplyr') # Filter people in Washington washington_people = filter(data, State=="Washington") print(washington_people)
Result:
Name Age State
1 Evelyn 53 Washington
2 Daisy 27 Washington
Also, you can combine many conditions by the operator & (and) or the operator | (or). For example, in the example above, we only want to list people having ages younger than 30.
Code:
# Create three vectors named Name, Age, and State Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy") Age <- c(22, 31, 17, 41, 53, 26, 27) State <- c("Florida", "Ohio", NA, "Texas", "Washington", "California", "Washington") # Create a data frame named data from the vectors data = data.frame(Name, Age, State) # Import library('dplyr') library('dplyr') # Filter people in Washington and Ages smaller than 30 washington_people = filter(data, State=="Washington" & Age < 30) print(washington_people)
Result:
Name Age State
1 Daisy 27 Washington
Use the filter() function with is.na()
In this example, we will try another function: is.na(). The function filters samples that the determined features equal to the NULL value. In the data frame, a person has State’s value assigned NULL. If you want to eliminate this sample, look at the following example.
Code:
# Create three vectors named Name, Age, and State Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy") Age <- c(22, 31, 17, 41, 53, 26, 27) State <- c("Florida", "Ohio", NA, "Texas", "Washington", "California", "Washington") # Create a data frame named data from the vectors data = data.frame(Name, Age, State) # Import library('dplyr') library('dplyr') # Filter people having the NULL value in the State column clean_data = filter(data, !is.na(State)) print(clean_data)
Result:
Name Age State
1 John 22 Florida
2 David 31 Ohio
3 James 41 Texas
4 Evelyn 53 Washington
5 Lily 26 California
6 Daisy 27 Washington
Use the filter() function with the interaction between columns
In this part, we will take a simple example to show you how to use the filter() function if the condition has an interaction between columns. The dataframe below contains the number of working days and the salaries they earn per day. We will use the function to find people with earnings greater than $1000 per month.
Code:
# Create three vectors named Name, Age, and State Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy") Day_work <- c(25, 27, 27, 20, 22, 15, 10) Daily_salary <- c(30, 45, 57, 73, 70, 50, 80) # Create a data frame named data from the vectors data = data.frame(Name, Day_work, Daily_salary) # Import library('dplyr') library('dplyr') # Filter people having income > 1000$ per month income = filter(data, Day_work*Daily_salary > 1000) print(income)
Result:
Name Day_work Daily_salary
1 David 27 45
2 Donald 27 57
3 James 20 73
4 Evelyn 22 70
Summary
In summary, the filter() function finds rows (samples) that satisfy the conditions of the expressions. We can combine flexible expressions or columns to get optimal results.
Maybe you are interested:

My name is Robert Collier. I graduated in IT at HUST university. My interest is learning programming languages; my strengths are Python, C, C++, and Machine Learning/Deep Learning/NLP. I will share all the knowledge I have through my articles. Hope you like them.
Name of the university: HUST
Major: IT
Programming Languages: Python, C, C++, Machine Learning/Deep Learning/NLP