The filter() function in R

filter function in r

In this tutorial, you will learn how to use the filter() function in R. The function help you manipulate with data frame’s rows and returns samples suitable with conditions.

Let’s move on to discover all about the function. 

What is the filter() function in R?

The filter() function executes on a dataframe to find rows (samples) that satisfy the conditions of the expression. 

Syntax: 

filter(data_frame, expression)

Parameters: 

  • data_frame: A data frame to apply the filter() function.
  • expression: The expressions having conditions to filter satisfied rows.

How to use the filter() function?

Because the filter() function aims to find samples satisfying the condition, the expressions passing to it are also conditional operators. Some handy functions to use with the filter() function are: ==, !=, &, between(), is.na(), etc…

Below are some examples of using the filter() function.

Use the filter() function with == 

In this example, we will use the filter() function to find people who live in Washington State. To filter based on the column State, we pass the column’s name, and the operator == is to specify the value of the target.

Code:

# Create three vectors named Name, Age, and State
Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy")
Age <- c(22, 31, 17, 41, 53, 26, 27)
State <- c("Florida", "Ohio", NA, "Texas", "Washington", "California", "Washington")
 
# Create a data frame named data from the vectors
data = data.frame(Name, Age, State)

# Import library('dplyr')
library('dplyr')

# Filter people in Washington
washington_people = filter(data, State=="Washington")
 
print(washington_people)

Result:

    Name Age      State
1 Evelyn  53 Washington
2  Daisy  27 Washington

Also, you can combine many conditions by the operator & (and) or the operator | (or). For example, in the example above, we only want to list people having ages younger than 30.

Code:

# Create three vectors named Name, Age, and State
Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy")
Age <- c(22, 31, 17, 41, 53, 26, 27)
State <- c("Florida", "Ohio", NA, "Texas", "Washington", "California", "Washington")
 
# Create a data frame named data from the vectors
data = data.frame(Name, Age, State)

# Import library('dplyr')
library('dplyr')

# Filter people in Washington and Ages smaller than 30
washington_people = filter(data, State=="Washington" & Age < 30)
 
print(washington_people)

Result:

   Name Age      State
1 Daisy  27 Washington

Use the filter() function with is.na()

In this example, we will try another function: is.na(). The function filters samples that the determined features equal to the NULL value. In the data frame, a person has State’s value assigned NULL. If you want to eliminate this sample, look at the following example.

Code:

# Create three vectors named Name, Age, and State
Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy")
Age <- c(22, 31, 17, 41, 53, 26, 27)
State <- c("Florida", "Ohio", NA, "Texas", "Washington", "California", "Washington")
 
# Create a data frame named data from the vectors
data = data.frame(Name, Age, State)

# Import library('dplyr')
library('dplyr')
 
# Filter people having the NULL value in the State column
clean_data = filter(data, !is.na(State))
 
print(clean_data)

Result:

    Name Age      State
1   John  22    Florida
2  David  31       Ohio
3  James  41      Texas
4 Evelyn  53 Washington
5   Lily  26 California
6  Daisy  27 Washington

Use the filter() function with the interaction between columns

In this part, we will take a simple example to show you how to use the filter() function if the condition has an interaction between columns. The dataframe below contains the number of working days and the salaries they earn per day. We will use the function to find people with earnings greater than $1000 per month.

Code:

# Create three vectors named Name, Age, and State
Name <- c("John", "David", "Donald", "James", "Evelyn", "Lily", "Daisy")
Day_work <- c(25, 27, 27, 20, 22, 15, 10)
Daily_salary <- c(30, 45, 57, 73, 70, 50, 80)
 
# Create a data frame named data from the vectors
data = data.frame(Name, Day_work, Daily_salary)

# Import library('dplyr')
library('dplyr')
 
# Filter people having income > 1000$ per month
income = filter(data, Day_work*Daily_salary > 1000)
 
print(income)

Result:

    Name Day_work Daily_salary
1  David       27           45
2 Donald       27           57
3  James       20           73
4 Evelyn       22           70

Summary

In summary, the filter() function finds rows (samples) that satisfy the conditions of the expressions. We can combine flexible expressions or columns to get optimal results.

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *