Merge In R: What is merge() in R and Example

merge in r

Hi guys! Today we will share with you a guide on how to merge in R. This is the most basic skill when you work with data in R, so please take a look at the below syntax and usage.

What is merge in R

Have you ever wanted to merge two data frames that have some columns or row names in common? That is the time you should consider using the R merge() function to achieve. This function can also do different operations of database joining on your data. Therefore, this function can be considered the same as the join() function in R.

Syntax of merge() in R

merge(x, y, by = intersect(names(x), names(y)),
  by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
  sort = TRUE, suffixes = c(".x", ".y"), no.dups = TRUE,
  incomparables = NULL, …)

Parameters:

x, yData frames to be merged
by, by.x, by.yDeclarations of columns to be merged.
alllogical TRUE or FALSE.
all.xlogical TRUE or FALSE
all.ylogical TRUE or FALSE analogous to all.x.
sortlogical TRUE or FALSE
suffixesa character vector of length 2
no.dupslogical TRUE or FALSE
incomparablesvalues that can’t be matched.

Merge() function in R example

You can merge datasets by using the function merge() in R and its optional parameters. Take a look at the following examples

Example 1

Suppose an accountant is making a statistic about the income of the top 3 authors in a company. The data she has is a table of all the author’s rating points and a table containing the earning bonus of the top 3 highest-scored authors. We can easily view a table of the top 3’s income by merging the two tables:

# Create a table points (line 1)
points = data.frame(
    Author = c(
        "Karley Crooks MD",
        "Rene Jacobs",
        "Stephen Haag",
        "Prof. Juvenal Ritchie",
        "Sonia Koepp III"
    ),
    Points = c("35.87", "32", "28.87", "27.71", "27.42")
)

cat('points\n')
points

# Create the table bonus (line 14)
bonus = data.frame(
    Author = c("Karley Crooks MD", "Stephen Haag", "Rene Jacobs"), 
    Bonus = c(1000, 800, 500)
)

cat('bonus\n')
bonus

# Merge two tables (line 20)
statistic = merge(points, bonus)
cat('statistic\n')
statistic

Output

points
                 Author Points
1      Karley Crooks MD  35.87
2           Rene Jacobs     32
3          Stephen Haag  28.87
4 Prof. Juvenal Ritchie  27.71
5       Sonia Koepp III  27.42
bonus
            Author Bonus
1 Karley Crooks MD  1000
2     Stephen Haag   800
3      Rene Jacobs   500
statistic
            Author Points Bonus
1 Karley Crooks MD  35.87  1000
2      Rene Jacobs     32   500
3     Stephen Haag  28.87   800

In lines 2-11: We have defined a dataset named points to represent an Author and the corresponding points of them.

In lines 17-20: We also define a table named bonus that contains the bonus money corresponding to the top 3 authors with the highest points.

Line 26: We use the merge function to get a statistical table pay by a natural join between two previous datasets.

As we have declared before, ‘Prof. Juvenal Ritchie’ and ‘Sonia Koepp III’ were not in both tables. As a result, in the output of this function they appear missing. However, if the accountant (in our context) here want to extract the table with the top 5 and do not care about the last 2 authors’ bonus, so she should have to follow the next example to know how to do it. 

Example 2

Suppose the context is the same as the previous one. But now we can easily view a table of the top 5’s income by merging the two tables:

# Create a table points (line 1)
points = data.frame(
    Author = c(
        "Karley Crooks MD",
        "Rene Jacobs",
        "Stephen Haag",
        "Prof. Juvenal Ritchie",
        "Sonia Koepp III"
    ),
    Points = c("35.87", "32", "28.87", "27.71", "27.42")
)

cat("points\n")
points

# Create the table bonus (line 14)
bonus = data.frame(
    Author = c("Karley Crooks MD", "Stephen Haag", "Rene Jacobs"), 
    Bonus = c(1000, 800, 500)
)

cat("bonus\n")
bonus

# Merge two tables (line 20)
statistic = merge(points, bonus, all = TRUE)
cat("statistic\n")
statistic

Output

points
                 Author Points
1      Karley Crooks MD  35.87
2           Rene Jacobs     32
3          Stephen Haag  28.87
4 Prof. Juvenal Ritchie  27.71
5       Sonia Koepp III  27.42
bonus
            Author Bonus
1 Karley Crooks MD  1000
2     Stephen Haag   800
3      Rene Jacobs   500
statistic
                 Author Points Bonus
1      Karley Crooks MD  35.87  1000
2 Prof. Juvenal Ritchie  27.71    NA
3           Rene Jacobs     32   500
4       Sonia Koepp III  27.42    NA
5          Stephen Haag  28.87   800

So have you seen the difference between the previous output and this one? In fact, there is a change like that because in line 26 we use the merge function with three parameters instead of two. This will perform a FULL join between two datasets.

As you can see, using this approach also helps you achieve the merged data, and the result will contain the row that cannot be merged because it is not presented in another data. But you must remember to pass the argument “all = TRUE” when calling the R merge function.

Summary

We have learned how to merge in R by using the R merge function, you can easily do the task. If you have any questions, feel free to provide your comments below. We also have lots of tutorials about R which you can find more.

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *