There are many ways to select data from a DataFrame and apply a function to it. For instance, you can split the dataset into groups using certain criteria.
Read on to learn more about Pandas groupby two columns with the DataFrame.groupby()
method.
Pandas Groupby Two Columns
DataFrame.groupby()
You can use the DataFrame.groupby() method to group data of a DataFrame using some conditions about its categories. This groupby operation is extremely useful in data analysis because it enables you to apply a function to that subset of data.
This is the full syntax of DataFrame.groupby():
DataFrame.groupby(by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna)
The most important parameter is by, which determines the data used for the aggregation. It can be a label, a list of labels, a function, or a mapping object.
If you provide a function, the groupby() method invokes it on each value of the index. If a Pandas Series or Python dict is passed, the method will use their value to determine the data groups.
There are also other parameters you should keep in mind when you the DataFrame.groupby() method:
- axis: this controls the axis along which groupby() should split the DataFrame. The default value is 0 (along rows), but you can also split along columns when 1 is set.
- level: if you have a MultiIndex DataFrame, use this to specify the level(s) you want to group data.
- dropna: this boolean parameter has the default True value, meaning it will drop NA values together with the column/row. Set it to False if you want to treat NA values as the key in groups.
Remember that the DataFrame.groupby() returns a groupby object containing information about your data groups.
Examples
This is a basic example of how to use the groupby() method to apply a function to data groups of a DataFrame.
First we create a DataFrame contains numbers of visitors of two websites in two months:
import pandas as pd
df = pd.DataFrame({
'Site': [
'LearnShareIT', 'Quora',
'LearnShareIT', 'Quora'],
'Visitors': [
135234, 22303,
356345, 37475]})
print(df)
Output:
Site Visitors
0 LearnShareIT 135234
1 Quora 22303
2 LearnShareIT 356345
3 Quora 37475
In this constructor, we have explicitly specified the labels of two columns by using them as the keys of a dict. We are going to make a groupby object by grouping data based on its label name.
print(df.groupby(['Site']))
print(df.groupby(['Site']).mean())
Output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1d0fe56e00>
Visitors
Site
LearnShareIT 245789.5
Quora 29889.0
As you can see, groupby objects are intended for printing and representing data. The most common application is to apply specific functions to these groups.
In the example above, we have used the mean() method to calculate the average number of visitors of each website. It uses information about the groups in the groupby object to do the math.
Some DataFrames are more complex and invoke Hierarchical Indexes. With the level parameter, you can group data in different levels with the groupby() method too.
For instance, this is a MultiIndex DataFrame in Pandas:
arr = [
['LearnShareIT', 'LearnShareIT', 'Quora', 'Quora'],
['August', 'September', 'August', 'September']]
index = pd.MultiIndex.from_arrays(arr, names=('Site', 'Month'))
df = pd.DataFrame({
'Visitors': [135234, 356345, 22303, 37475]},
index=index)
print(df)
Output:
Visitors
Site Month
LearnShareIT August 135234
September 356345
Quora August 22303
September 37475
Compared to the previous DataFrame, we have added the names of the months, adding another dimension to our data. To find the average number of visitors between two months of each site, we need to set the level parameter to 0:
print(df.groupby(level=0).mean())
Output:
Visitors
Site
LearnShareIT 245789.5
Quora 29889.0
You can also dive deeper into the data and apply functions to the inner indexes. This is how you can find the combined number of visitors each month:
print(df.groupby(level='Month').sum())
Output:
Visitors
Month
August 157537
September 393820
Summary
You can Pandas groupby two columns or more with the DataFrame.groupby()
. It can split the data into groups based on the rules you provided, allowing you to carry out complicated data analysis with DataFrames.
Maybe you are interested:
- DataFrame constructor not properly called!
- TypeError: Object of type DataFrame is not JSON serializable in Python
- How To Find The Difference Between Two Data Frames In Pandas

My name is Robert. I have a degree in information technology and two years of expertise in software development. I’ve come to offer my understanding on programming languages. I hope you find my articles interesting.
Job: Developer
Name of the university: HUST
Major: IT
Programming Languages: Java, C#, C, Javascript, R, Typescript, ReactJs, Laravel, SQL, Python