How To Use The melt() Function In R

Melting (and its counterpart – casting) is one of the most interesting capabilities in R. It allows you to reshape your data in various ways. In particular, this guide will show how the melt() function in R works.

Table of Contents

melt() Function In R

As data engineers and scientists, you are likely to be familiar with the wide format of data structures. In this format, data is represented across many columns, with each of them corresponding to a specific variable.

Many R functions require you to stretch this data, making each participant occupy not one but multiple rows. This is called the long format, and melt() can help you transform your wide data into it.

This function belongs to the reshape and its reboot, the reshape2 package. They are created by Hadley Wickham, author of the ggplot2 package.

You will need to install and load reshape2 before using melt():

install.packages("reshape2")

load(reshape2)

melt() is a generic function with melt.data.frame(), melt.array(), and melt.list() as its extended methods.

Let’s get started by creating a sample data frame from the built-in mtcars dataset. In this example, we use only the first three variables and remove the rest of the columns.

df = subset(head(mtcars), select = -c(hp, drat, wt, qsec, vs, am, gear, carb))
df

Output

                   mpg cyl disp
Mazda RX4 21.0 6 160
Mazda RX4 Wag 21.0 6 160
Datsun 710 22.8 4 108
Hornet 4 Drive 21.4 6 258
Hornet Sportabout 18.7 8 360
Valiant 18.1 6 225

In each data frame, each car model is represented by a row and three columns, which contain data about the miles per gallon, number of cylinders, and displacement.

Note: learn more about the head() function here.

You can “melt” this data frame and make it narrower with the melt() function. By default, it doesn’t need any other arguments other than your data:

melt(df)

Output

Using as id variables
   variable value
1 mpg 21.0
2 mpg 21.0
3 mpg 22.8
4 mpg 21.4
5 mpg 18.7
6 mpg 18.1
7 cyl 6.0
8 cyl 6.0
9 cyl 4.0
10 cyl 6.0
11 cyl 8.0
12 cyl 6.0
13 disp 160.0
14 disp 160.0
15 disp 108.0
16 disp 258.0
17 disp 360.0
18 disp 225.0

As you can see, the output is a data frame containing only two rows where specifications of car models are stacked on each other. This is when melt() uses the row labels to melt your data.

The result is a data frame with only two columns: variable and value. Each row is an instance of a value. Now your data has become longer, and participants are no longer represented by a single row.

If you want to keep certain columns, you can specify them with the id argument. For instance, this command allows you to retain the mpg column while reshaping the other two:

melt(df, id=c("mpg"))

Output

    mpg variable value
1 21.0 cyl 6
2 21.0 cyl 6
3 22.8 cyl 4
4 21.4 cyl 6
5 18.7 cyl 8
6 18.1 cyl 6
7 21.0 disp 160
8 21.0 disp 160
9 22.8 disp 108
10 21.4 disp 258
11 18.7 disp 360
12 18.1 disp 225

After melting your data, you can convert it back to the original shape with the cast() function. It accepts the elongated data frame you have created with melt():

df2 <- melt(df, id = c("mpg"))
cast(df2, mpg~variable, mean)

Output

   mpg cyl disp
1 18.1 6 225
2 18.7 8 360
3 21.0 6 160
4 21.4 6 258
5 22.8 4 108

Summary

The melt() function in R can reshape your data into fewer columns, which can be a requirement for many functionalities in this language.

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *