Today, we will learn how to merge rows in R. This is the most basic knowledge when you work with data frames in R. Let’s read this article now to understand more.
Solutions to merge rows in R
It is a fact that there are many functions built-in and packages that can merge rows in R. However, we will show you the two most used methods that are fill() and cbind().
To merge rows in R, you can use the fill() function. But this function must be imported from packages tidyr before use, so ensures to install the package first:
fill(data, ..., .direction = c("down", "up"))
- data: The table
- …: A selection of columns. If empty, nothing happens.
- .direction: Direction in which to fill missing values.
Assume you have a table stored in df:
# Create a table df = data.frame( ID=c(11, 11, 22, 22, 33, 33, 44, 44), Salary=c(NA, 1000, NA, 2000, NA, 3000, NA, 4000), Name=c("Jack", "Jack", "Jane", "Jane", "John", "John", "Jay", "Jay"), Bonus=c(100, NA, 200, NA, 300, NA, 400, NA) ) # Display the table df
ID Salary Name Bonus 1 11 NA Jack 100 2 11 1000 Jack NA 3 22 NA Jane 200 4 22 2000 Jane NA 5 33 NA John 300 6 33 3000 John NA 7 44 NA Jay 400 8 44 4000 Jay NA
If you want to merge the rows which are disjoint and contain NA values in our table and expect the result as follows:
ID Salary Name Bonus 1 11 1000 Jack 100 2 22 2000 Jane 200 3 33 3000 John 300 4 44 4000 Jay 400
Then we suggest you use the function groupby() first, and then use fill() function:
library(tidyr) library(dplyr) #Merge rows in table df df %>% group_by(ID) %>% fill(everything(), .direction = "downup") %>% distinct()
# A tibble: 4 x 4 # Groups: ID  ID Salary Name Bonus <dbl> <chr> <dbl> <dbl> 1 11 1000 Jack 100 2 22 2000 Jane 200 3 33 3000 John 300 4 44 4000 Jay 400
The example above shows that first, we grouped the rows by their name, then filled all the missing data columns in order of down to up, and finally took the distinct records.
As can be seen, after we merge rows in R, we receive output the same as we expected. However, to use this method, you must remember to install and import the two libraries, dplyr and tidyr. Please read the next solution if you don’t want to use packages.
There is a different way that can merge rows in R. By using cbind() function, you won’t have to install any packages, here is its syntax:
cbind (df1, df2)
- df1: The first table or data frames
- df2: The second table or data frames
This function is supposed to combine a pair of given Table, Matrix, Vector or Data Frames by columns. If your table is exactly like the structure in the previous table and you need to do this quickly for discrete purposes, you can follow this example:
# Merge by all the same rows cbind(df[c(FALSE, TRUE), 1:2], df[c(TRUE,FALSE),3:4])
ID Salary Name Bonus 2 11 1000 Jack 100 4 22 2000 Jane 200 6 33 3000 John 300 8 44 4000 Jay 400
The logic behind this method is to take the odd rows (1, 3, 5) for columns 1 through 2 and append them to the even rows (2, 4, 6) for columns 3 to 4. However, this solution won’t work if there is a row whose NA value in Salary is out of odd order or the NA value in Bonus is not in even order. It would be best if you considered using this approach only when you are rushing and you know the structure of your data pretty well but do not want to use external libraries.
We have learned how to merge rows in R through two different approaches. You can find out more tutorials on R. We recommend you use the first solution because it will work in any cases. Good luck for you!
Maybe you are interested:
Name of the university: HCMUT