One of the tools to analyze and process data quickly is the Pandas library. This is an open-source Python library that is extremely popular with users who work with data. In this tutorial, we will help you learn about the basics and importance of Pandas to be able to use them.
What is Pandas?
Pandas is an open-source Python library. Through Pandas, users can manipulate data analysis for tables or time series easily and quickly.
Using Pandas to process data brings many advantages to users: Data processing, data manipulation, and data division all become accessible and quickly.
Pandas is built on top of NumPy, so it can analyze data quickly, and performance is optimized.
Install Pandas
Installing Pandas is very easy. You can choose from the following ways:
- Use the command: C:\Users\Your Name>pip install pandas
- Using Anaconda: conda install -c anaconda pandas
- Using Pip: pip install pandas
To initialize one In Pandas, here are two ways:
import pandas
or
import pandas as pd
Data Structures in Pandas
The three main data structures in Pandas are Series, DataFrame, and Panel.
In Pandas, when one data structure is taller than another, it is the container of the remaining data. For example, the DataFrame will be the container of the Series.
Data structures in Pandas are subject to change, including size (Excluding Series). DataFrame is most commonly used because of its data analysis capabilities.
Data Structure | Size | Description |
Series | 1-dimensional | The homogeneous array is labeled 1D, resizable. |
DataFrames | 2-dimensional | Tabular structure with generic 2D labels. Resizable with columns that can be imported heterogeneously. |
Panel | 3-dimensional | The array is labeled generic 3D, resizable. |
Pandas Series
The Series’ data structure is one-dimensional. It accepts various data values like integers, floats, and strings. Series quickly processes computational operations to return a one-dimensional array. In theory, a Series should not accept data containing multiple columns.
Pandas Series has the following parameters:
pandas.Series( data, index, dtype, copy)
Parameter:
- data: can be in many different forms: ndarray, list, constants.
- index: must ensure unique value, index is hashable and has the same length as the data. When no index value is passed, it will default to np.arrange(n).
- dtyped: data type. When set to None, dtyped will be automatically parsed to the data type
- copy: copy data, default is False.
For example:
#import pandas series by index import pandas as pd import numpy as np student = np.array(['John','Jack','Rose','Jennie','Selena']) mytable = pd.Series(student, index=[1, 2, 3, 4, 5]) print(mytable)
Output:
1 John
2 Jacks
3 Roses
4 Jennie
5 Selena
dtype: object
Pandas DataFrame
Pandas DataFrames are data structures that contain:
- Data is organized in 2 dimensions, including rows and columns
- The Labels correspond to rows and columns
The DataFrame allows very flexible access to rows (by index) and to columns (by column names).
You can start working with DataFrames by importing Pandas:
Python:
>>> import pandas as pd
Ways to Create a DataFrame
To create a DataFrame, you can use many different ways. Here are some popular ways:
- Creating a DataFrame from a Series
- Create DataFrame from List of Dictionary
- Create a DataFrame from a Dictionary containing Series
- Create a DataFrame from a NumPy 2-dimensional array
For example:
import pandas as pd mydata = { "myclass": ['A2', 'A3', 'A4'], "student": [50, 40, 45] } #pass data to DataFrame: df = pd.DataFrame(mydata) print(df)
Output:
myclass student
0 A2 50
1 A3 40
2 A4 45
Reading CSV files in Pandas
A CSV file is a text file that can store large volumes of data. In Pandas, users can use CSV files and read files to manipulate the data.
For example:
CSV file: mydata.csv
#Load a CSV file into a Dataframe import pandas as pd dt = pd.read_csv('mydata.csv') print(dt.to_string())
Output:
Pandas Reindexing
As mentioned above, the index in Pandas DataFrame is immutable. However, in some cases, the user will need to change the label of the row or column. To perform it, the user can apply the Reindexing operation.
The Reindexing operation is an operation to adjust the data so that it fits a set of labels and column labels of the DataFrame.
Through the index, the user can reorder the data so that they match the label. Unlabeled positions can be marked (NA).
For example:
Create a DataFrame:
import pandas as pd index = ['Peter', 'Tom', 'Jerry', 'Liza', 'Emi'] df = pd.DataFrame({'Year': [1986, 2006, 1987, 1982, 1981], 'Job': ['Programmer','Freelancer', 'Editor', 'Marketers', 'Seller']}, index = index) df

Perform column indexing:
df.reindex(columns=['Income', 'Address'])

Handling missing data in Pandas
Many data have no value called empty or null data and are represented as NaN in the Data Frame table. This data we have many ways to handle it. It can be deleted or filled with other data with the same or similar value.
Pandas choose 2 null values available in Python, NaN and None, to represent missing data in the data sets it processes. Each choice will basically have a number of different benefits and limitations.
Several methods can be used to handle missing data:
dropna()
method to delete
fillna()
method is used to fill in NaN data
For example:
mynumber = pd.DataFrame([[2, np.nan, 5], [3, 7, 9], [4, 4, np.nan]]) print(mynumber) print("Delete rows containing NaN: \n", mynumber.dropna()) # can use dropna(axis=0) or dropna(axis='rows') instead print("\nDelete column containing NaN: \n", mynumber.dropna(axis=1)) # can use dropna(axis='columns') instead
Output:
#Delete row containing NaN
0 1 2
1 3 7.0 9.0
#Delete column containing NaN
0
0 2
1 3
2 4
Tutorials on Pandas
- TypeError: Object of type DataFrame is not JSON serializable in Python
- How To Find The Difference Between Two Data Frames In Pandas
- Pandas Groupby Two Columns With Examples
- How To Fix “DataFrame constructor not properly called!”
Summary
In this tutorial, we have covered the basics and common sense when working with Pandas. This is a library for very efficient data processing. Hope you can quickly grasp this knowledge.
If you are interested in other programming languages, you can visit our LearnShareIT website. Thanks for reading!