How To Merge By Multiple Columns In R

Merge by multiple columns in r

In this tutorial, we will share you how to merge by multiple columns in R. When working with R, there is often a need to achieve this kind of merge.

Merge by multiple columns in R

Fortunately, there are some functions that can help you merge by multiple columns in R, including built-in functions and packages also. In fact, programmers often use these methods: full_join() and merge().

Using full_join() 

To merge by multiple columns in R, you can use the full_join() function. But this function must be imported from package tidyverse before use so ensures to install the package first:

Syntax:

inner_join(x, y, by = c(col1,col2,..,coln))

Parameters:

  • x: The first table.
  • y: The second table.
  • col1, col2,.., col n: All the columns to merge.

Assume you have two tables stored in df1, df2.

# Create a data frame 1
df1 = data.frame(
    Name=c("Jane", "Sam", "Kite", "Robbert", "Alberto", "Dupree"),
    Gender=c("male", "male", "female"),
    Salary=c(123, 234, 345, 456, 678, 789),
    Company=c("LEARNSHAREIT", "LEARNSHAREIT", "LEARNSHAREIT", "ABC", "OOO", "XYZ")
)

# Display df1
df1

# Create a data frame 2
df2 = data.frame(
    Name=c("Jane", "Sam", "Kite", "Robbert", "Alberto", "Dupree"),
    Company=c("LEARNSHAREIT", "XYZ", "LEARNSHAREIT", "ABC", "OOO", "ABC"),
    Gender=c("male", "male", "female"),
    Age=c(23, 34, 45, 54, 43, 32)
)

# Display df2
df2

 Now you have to merge them by the first two columns using inner_join:

library(tidyverse)

# Merge by preceding three columns "Name","Company","Gender"
full_join(df1, df2, by=c("Name", "Company", "Gender"))

Output

> df1
    Name Gender Salary      Company
1    Jane   male    123 LEARNSHAREIT
2     Sam   male    234 LEARNSHAREIT
3    Kite female    345 LEARNSHAREIT
4 Robbert   male    456          ABC
5 Alberto   male    678          OOO
6  Dupree female    789          XYZ
> 
> df2
     Name      Company Gender Age
1    Jane LEARNSHAREIT   male  23
2     Sam          XYZ   male  34
3    Kite LEARNSHAREIT female  45
4 Robbert          ABC   male  54
5 Alberto          OOO   male  43
6  Dupree          ABC female  32
> 
     Name Gender Salary      Company Age
1    Jane   male    123 LEARNSHAREIT  23
2     Sam   male    234 LEARNSHAREIT  NA
3    Kite female    345 LEARNSHAREIT  45
4 Robbert   male    456          ABC  54
5 Alberto   male    678          OOO  43
6  Dupree female    789          XYZ  NA
7     Sam   male     NA          XYZ  34
8  Dupree female     NA          ABC  32

As can be seen, after merging two tables by multiple columns which are “Name”, “Company” and “Gender”, we got a table like above. If you want to merge by all columns, just remove the “by” argument in the full_join() function, for example:

library (tidyverse) 
 
# Merge by all columns
full_join(df1,df2)

Output

Joining, by = c("Name", "Gender", "Company")
     Name Gender Salary      Company Age
1    Jane   male    123 LEARNSHAREIT  23
2     Sam   male    234 LEARNSHAREIT  NA
3    Kite female    345 LEARNSHAREIT  45
4 Robbert   male    456          ABC  54
5 Alberto   male    678          OOO  43
6  Dupree female    789          XYZ  NA
7     Sam   male     NA          XYZ  34
8  Dupree female     NA          ABC  32

The example above shows that we have removed the by argument in the full_join() function. It will automatically merge the data frames by all the columns that appear in each table. In this case, it is exactly the same three columns as in the previous example. The full_join() function will perform a full join operation on the data, if you want to neglect the rows that have NA values, you should consider using inner_join() instead.

Using merge()

There is another way to merge by multiple columns in R by using merge function in R. Its syntax has been introduced here. We will use the same tables of the above solution to apply this method:

# Merge two data frames by two columns: "Name" and "Company"
df = merge(df1,df2,by=c("Name","Company"))
df

Output

> df
     Name      Company Gender.x Salary Gender.y Age
1 Alberto          OOO     male    678     male  43
2    Jane LEARNSHAREIT     male    123     male  23
3    Kite LEARNSHAREIT   female    345   female  45
4 Robbert          ABC     male    456     male  54

The difference between this function and the full_join function is that the R merge function doesn’t need import from a package. In general, it also produces the same output as the full_join() one. Additionally, if you want to merge by all common columns and neglect the rows that have NA values, just remove the by argument: 

# Merge two data frames by all columns 
df = merge(df1,df2)
df

Output

> df
     Name Gender      Company Salary Age
1 Alberto   male          OOO    678  43
2    Jane   male LEARNSHAREIT    123  23
3    Kite female LEARNSHAREIT    345  45
4 Robbert   male          ABC    456  54

Summary

We have learned how to merge by multiple columns in R through two different approaches. It would help if you considered that the first approach is the most effective way for many cases. Moreover you will be able to discover more tutorials of R here.

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *