Pandas Groupby Two Columns With Examples

Pandas Groupby Two Columns

There are many ways to select data from a DataFrame and apply a function to it. For instance, you can split the dataset into groups using certain criteria.

Read on to learn more about Pandas groupby two columns with the DataFrame.groupby() method.

Pandas Groupby Two Columns


You can use the DataFrame.groupby() method to group data of a DataFrame using some conditions about its categories. This groupby operation is extremely useful in data analysis because it enables you to apply a function to that subset of data.

This is the full syntax of DataFrame.groupby():

DataFrame.groupby(by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna)

The most important parameter is by, which determines the data used for the aggregation. It can be a label, a list of labels, a function, or a mapping object.

If you provide a function, the groupby() method invokes it on each value of the index. If a Pandas Series or Python dict is passed, the method will use their value to determine the data groups.

There are also other parameters you should keep in mind when you the DataFrame.groupby() method:

  • axis: this controls the axis along which groupby() should split the DataFrame. The default value is 0 (along rows), but you can also split along columns when 1 is set.
  • level: if you have a MultiIndex DataFrame, use this to specify the level(s) you want to group data.
  • dropna: this boolean parameter has the default True value, meaning it will drop NA values together with the column/row. Set it to False if you want to treat NA values as the key in groups.

Remember that the DataFrame.groupby() returns a groupby object containing information about your data groups.


This is a basic example of how to use the groupby() method to apply a function to data groups of a DataFrame.

First we create a DataFrame contains numbers of visitors of two websites in two months:

import pandas as pd
df = pd.DataFrame({
    'Site': [
        'LearnShareIT', 'Quora',
        'LearnShareIT', 'Quora'],
    'Visitors': [
        135234, 22303,
        356345, 37475]})


           Site  Visitors
0  LearnShareIT    135234
1         Quora     22303
2  LearnShareIT    356345
3         Quora     37475

In this constructor, we have explicitly specified the labels of two columns by using them as the keys of a dict. We are going to make a groupby object by grouping data based on its label name.



<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1d0fe56e00>
LearnShareIT  245789.5
Quora          29889.0

As you can see, groupby objects are intended for printing and representing data. The most common application is to apply specific functions to these groups.

In the example above, we have used the mean() method to calculate the average number of visitors of each website. It uses information about the groups in the groupby object to do the math.

Some DataFrames are more complex and invoke Hierarchical Indexes. With the level parameter, you can group data in different levels with the groupby() method too.

For instance, this is a MultiIndex DataFrame in Pandas:

arr = [
    ['LearnShareIT', 'LearnShareIT', 'Quora', 'Quora'],
    ['August', 'September', 'August', 'September']]

index = pd.MultiIndex.from_arrays(arr, names=('Site', 'Month'))

df = pd.DataFrame({
    'Visitors': [135234, 356345, 22303, 37475]},



Site         Month              
LearnShareIT August       135234
             September    356345
Quora        August        22303
             September     37475

Compared to the previous DataFrame, we have added the names of the months, adding another dimension to our data. To find the average number of visitors between two months of each site, we need to set the level parameter to 0:



LearnShareIT  245789.5
Quora          29889.0

You can also dive deeper into the data and apply functions to the inner indexes. This is how you can find the combined number of visitors each month:



August       157537
September    393820


You can Pandas groupby two columns or more with the DataFrame.groupby(). It can split the data into groups based on the rules you provided, allowing you to carry out complicated data analysis with DataFrames.

Maybe you are interested:

Leave a Reply

Your email address will not be published. Required fields are marked *