How To Use count() Function In R

count() Function In R

You can use the count() function in R to get the overall idea of value distribution in a data frame. Read on to learn more about it.

Count() Function In R

What Does The count() Function Does?

count() is a function from the dplyr package that allows you to group observations by counting unique values of variables in data frames.

Install count()

Since dplyr belongs to the tidyverse package collection, you will need to install it first in order to use this function.

From your R console:

install.packages("tidyverse")

Wait for the download and installation processes to complete, then load the dplyr package into your R environment:

library(dplyr)

Syntax

The count() function has a fairly simple syntax as follows:

count(x, vars, wt, sort, name)

In this:

  • x is the R data frame, dtplyr/dbplyr lazy data frame, or the tibble (a data frame extension) you want to process.
  • vars is the list of variables you want to group by.
  • wt is a data-masking argument that indicates the frequency weights. By default, it is set to NULL, meaning count() will count and show you the number of rows of each group. When you set it to a variable, count() will compute sum(wt) for each observation group.
  • When sort is TRUE, count() will sort the results by size with bigger groups on top.
  • name is the name of the occurrence column in the output. By default, it is set to n. But this can lead to errors when your data frame already has a column with that name. Set it to another name to avoid the problem.

Examples

We will illustrate the capabilities of the count() function by using a built-in data set, in particular mtcars. It contains various specifications of several car models in the US published by the Motor Trend US magazine in 1974.

First, we use the head() function to get the first 15 rows and convert the output into a data frame with data.frame().

df <- data.frame(head(mtcars, 15))
df

Note: read this guide to learn more about the head() function.

This data frame consists of 15 observations of 11 variables, such as hp (gross horsepower) or gear (the number of forward gears).

Let’s say we are interested in the number of cylinders, which are represented by the cyl variable. This is how you can use count() to count the number of cars with the same number of cylinders.

count(df, cyl)
 cyl n
1 4 3
2 6 6
3 8 6

As the output has shown, there are 3 cars with 4 cylinders, 6 cars with 6 cylinders, and 6 cars with 8 cylinders in our data frame.

Remember that you can also use the forward pipe operator (%>%) to pass the data frame to count(). It is provided by the magrittr package, which is also part of the tidyverse collection.

In fact, tidyverse developers use this operator to make your code more readable. This command will yield the same result as the above one:

df %>% count(cyl)

Set the sort argument to TRUE when you need to sort your output:

df %>% count(cyl, sort = TRUE)
cyl n
1 6 6
2 8 6
3 4 3

You can also count observations of multiple variables at once. This is how you can count how many cars have a V-shaped engine and how many have a straight one with the vs variable:

df %>% count(cyl, vs)
 cyl vs n
1 4 1 3
2 6 0 2
3 6 1 4
4 8 0 6

Remember that count() will create subgroups for each big group of observations. In the example above, we see that all three 3-cylinder cars have straight engines, while all six 8-cylinder cars have V-shaped engines. Meanwhile, the numbers of 6-cylinder cars with V-shaped and straight engines are split (2 and 4, respectively).

Summary

The count() function in R allows you to count and group observations based on the values of variables. Remember to install the dplyr from the tidyverse collection to make use of this function.

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *