How To Join Tables In R

Join tables in r

The data of an object can be represented in many different tables. If you need to aggregate the data in the tables into one large data table, you need to join the tables. Read to the end of this article to learn about join tables in R.

Join Tables In R

Note: In this article, we use the term ‘table’ interchangeably with the term ‘dataframe’.

The condition to join two tables is that there is a common column between the two tables to be joined. Each row in this column must have a unique value. This ensures each row will have a unique identifier.

We build an automobile database of two tables. The first table contains data on the number of cylinders and fuel consumption (Miles/gallon (US)), and the second table contains data on cylinder displacement (cu.in.) and total horsepower. These two tables have one column in common, the car name.

Code:

# Create the first table
cat("Fuel consumption (Miles/gallon (US)) and number of cylinders:\n")
merc1 <-
  data.frame(
    mercName = c("Merc 230", "Merc 280", "Merc 280C", "Merc 450SE"),
    mpg = c(22.8, 19.2, 17.8, 16.4),
    cyl = c(4, 6, 6, 8)
  )
print(merc1)

# Create the second table
cat("\nCylinder displacement (cu.in.) and total horsepower:\n")
merc2 <-
  data.frame(
    mercName = c("Merc 240D", "Merc 230", "Merc 280", "Merc 280C"),
    disp = c(146.7, 140.8, 167.6, 167.6),
    hp = c(62, 95, 123, 123)
  )
print(merc2)

Output:

Fuel consumption (Miles/gallon (US)) and number of cylinders:
    mercName  mpg cyl
1   Merc 230 22.8   4
2   Merc 280 19.2   6
3  Merc 280C 17.8   6
4 Merc 450SE 16.4   8

Cylinder displacement (cu.in.) and total horsepower:
   mercName  disp  hp
1 Merc 240D 146.7  62
2  Merc 230 140.8  95
3  Merc 280 167.6 123
4 Merc 280C 167.6 123

We can use the merge() function in R to join two tables. We already have a detailed article about the syntax and usage of the merge() function in R. Click here to learn more.

Inner join

An inner join is when the merge() function returns only rows with matching keys of two tables.

Example:

cat("Inner join:\n")

merc <- merge(
    x = merc1, 
    y = merc2, 
    by = "mercName")

print(merc)

Output:

Inner join:
   mercName  mpg cyl  disp  hp
1  Merc 230 22.8   4 140.8  95
2  Merc 280 19.2   6 167.6 123
3 Merc 280C 17.8   6 167.6 123

Left join

A left join is when the merge() function returns all the rows of the first listed table and the rows with matching keys of the two tables in the latter listed.

Example:

cat("Left join:\n")

leftJoin <- merge(
    x = merc1,
    y = merc2,
    by = "mercName",
    all.x = T
)

print(leftJoin)

Output:

Left join:
    mercName  mpg cyl  disp  hp
1   Merc 230 22.8   4 140.8  95
2   Merc 280 19.2   6 167.6 123
3  Merc 280C 17.8   6 167.6 123
4 Merc 450SE 16.4   8    NA  NA

Right join

Right join is when the merge() function returns all rows of the following listed table and the rows with matching keys of the two tables in the first listed table.

Example:

cat("Right join:\n")

rightJoin <- merge(
    x = merc1,
    y = merc2,
    by = "mercName",
    all.y = T
)

print(rightJoin)

Output:

Right join:
   mercName  mpg cyl  disp  hp
1  Merc 230 22.8   4 140.8  95
2 Merc 240D   NA  NA 146.7  62
3  Merc 280 19.2   6 167.6 123
4 Merc 280C 17.8   6 167.6 123

Full join

A full join is when the merge() function returns all rows of two tables regardless of whether the keys match or not.

Example:

cat("Full join:\n")

fullJoin <- merge(
    x = merc1,
    y = merc2,
    by = "mercName",
    all.x = T,
    all.y = T
)

print(fullJoin)

Output:

Full join:
    mercName  mpg cyl  disp  hp
1   Merc 230 22.8   4 140.8  95
2  Merc 240D   NA  NA 146.7  62
3   Merc 280 19.2   6 167.6 123
4  Merc 280C 17.8   6 167.6 123
5 Merc 450SE 16.4   8    NA  NA

Using functions from the dplyr package

The dplyr package provides functions to perform the full range of joins. Those functions are inner_join(), left_join(), right_join(), and full_join().

Example:

library('dplyr')

# Inner join
cat("Inner join:\n")
inner <- inner_join(merc1, merc2, by = "mercName")
print(inner)

# Left join
cat("\nLeft join:\n")
left <- left_join(merc1, merc2, by = "mercName")
print(left)

# Right join
cat("\nRight join:\n")
right <- right_join(merc1, merc2, by = "mercName")
print(right)

# Full join
cat("\nFull join:\n")
full <- full_join(merc1, merc2, by = "mercName")
print(full)

Output:

Inner join:
   mercName  mpg cyl  disp  hp
1  Merc 230 22.8   4 140.8  95
2  Merc 280 19.2   6 167.6 123
3 Merc 280C 17.8   6 167.6 123

Left join:
    mercName  mpg cyl  disp  hp
1   Merc 230 22.8   4 140.8  95
2   Merc 280 19.2   6 167.6 123
3  Merc 280C 17.8   6 167.6 123
4 Merc 450SE 16.4   8    NA  NA

Right join:
   mercName  mpg cyl  disp  hp
1  Merc 230 22.8   4 140.8  95
2  Merc 280 19.2   6 167.6 123
3 Merc 280C 17.8   6 167.6 123
4 Merc 240D   NA  NA 146.7  62

Full join:
    mercName  mpg cyl  disp  hp
1   Merc 230 22.8   4 140.8  95
2   Merc 280 19.2   6 167.6 123
3  Merc 280C 17.8   6 167.6 123
4 Merc 450SE 16.4   8    NA  NA
5  Merc 240D   NA  NA 146.7  62

Summary

We have introduced ways to join tables in R. You can use the merge() function or the functions in the dplyr package to join tables in R. Choose the most suitable method for efficient data analysis using the R language.

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *