Pandas Tutorials

One of the tools to analyze and process data quickly is the Pandas library. This is an open-source Python library that is extremely popular with users who work with data. In this tutorial, we will help you learn about the basics and importance of Pandas to be able to use them.

What is Pandas?

Pandas is an open-source Python library. Through Pandas, users can manipulate data analysis for tables or time series easily and quickly.

Using Pandas to process data brings many advantages to users: Data processing, data manipulation, and data division all become accessible and quickly.

Pandas is built on top of NumPy, so it can analyze data quickly, and performance is optimized.

Install Pandas

Installing Pandas is very easy. You can choose from the following ways:

  • Use the command: C:\Users\Your Name>pip install pandas
  • Using Anaconda: conda install -c anaconda pandas
  • Using Pip: pip install pandas

To initialize one In Pandas, here are two ways:

import pandas


import pandas as pd

Data Structures in Pandas

The three main data structures in Pandas are Series, DataFrame, and Panel.

In Pandas, when one data structure is taller than another, it is the container of the remaining data. For example, the DataFrame will be the container of the Series.

Data structures in Pandas are subject to change, including size (Excluding Series). DataFrame is most commonly used because of its data analysis capabilities.

Data StructureSizeDescription
Series1-dimensionalThe homogeneous array is labeled 1D, resizable.
DataFrames2-dimensionalTabular structure with generic 2D labels. Resizable with columns that can be imported heterogeneously.
Panel3-dimensionalThe array is labeled generic 3D, resizable.

Pandas Series

The Series’ data structure is one-dimensional. It accepts various data values ​​like integers, floats, and strings. Series quickly processes computational operations to return a one-dimensional array. In theory, a Series should not accept data containing multiple columns.

Pandas Series has the following parameters:

pandas.Series( data, index, dtype, copy)


  • data: can be in many different forms: ndarray, list, constants.
  • index: must ensure unique value, index is hashable and has the same length as the data. When no index value is passed, it will default to np.arrange(n).
  • dtyped: data type. When set to None, dtyped will be automatically parsed to the data type
  • copy: copy data, default is False. 

For example:

#import pandas series by index
import pandas as pd
import numpy as np

student = np.array(['John','Jack','Rose','Jennie','Selena'])
mytable = pd.Series(student, index=[1, 2, 3, 4, 5])


1 John
2 Jacks
3 Roses
4 Jennie
5 Selena
dtype: object

Pandas DataFrame

Pandas DataFrames are data structures that contain:

  • Data is organized in 2 dimensions, including rows and columns
  • The Labels correspond to rows and columns

The DataFrame allows very flexible access to rows (by index) and to columns (by column names).

You can start working with DataFrames by importing Pandas:


>>> import pandas as pd

Ways to Create a DataFrame

To create a DataFrame, you can use many different ways. Here are some popular ways:

  • Creating a DataFrame from a Series
  • Create DataFrame from List of Dictionary
  • Create a DataFrame from a Dictionary containing Series
  • Create a DataFrame from a NumPy 2-dimensional array

For example:

import pandas as pd
mydata = {
   "myclass": ['A2', 'A3', 'A4'],
   "student": [50, 40, 45]
#pass data to DataFrame:
df = pd.DataFrame(mydata)


   myclass student
0 A2 50
1 A3 40
2 A4 45

Reading CSV files in Pandas

A CSV file is a text file that can store large volumes of data. In Pandas, users can use CSV files and read files to manipulate the data.

For example:

CSV file: mydata.csv

#Load a CSV file into a Dataframe
import pandas as pd
dt = pd.read_csv('mydata.csv')


Pandas Reindexing 

As mentioned above, the index in Pandas DataFrame is immutable. However, in some cases, the user will need to change the label of the row or column. To perform it, the user can apply the Reindexing operation.

The Reindexing operation is an operation to adjust the data so that it fits a set of labels and column labels of the DataFrame.

Through the index, the user can reorder the data so that they match the label. Unlabeled positions can be marked (NA).

For example:

Create a DataFrame:

import pandas as pd
index = ['Peter', 'Tom', 'Jerry', 'Liza', 'Emi']
df = pd.DataFrame({'Year': [1986, 2006, 1987, 1982, 1981],
                 'Job': ['Programmer','Freelancer', 'Editor', 'Marketers', 'Seller']},
                   index = index)

Perform column indexing:

df.reindex(columns=['Income', 'Address'])

Handling missing data in Pandas

Many data have no value called empty or null data and are represented as NaN in the Data Frame table. This data we have many ways to handle it. It can be deleted or filled with other data with the same or similar value.

Pandas choose 2 null values ​​available in Python, NaN and None, to represent missing data in the data sets it processes. Each choice will basically have a number of different benefits and limitations.

Several methods can be used to handle missing data:

dropna() method to delete

fillna() method is used to fill in NaN data

For example:

mynumber = pd.DataFrame([[2, np.nan, 5],
                  [3, 7, 9],
                  [4, 4, np.nan]])
print("Delete rows containing NaN: \n", mynumber.dropna()) # can use dropna(axis=0) or dropna(axis='rows') instead
print("\nDelete column containing NaN: \n", mynumber.dropna(axis=1)) # can use dropna(axis='columns') instead


#Delete row containing NaN
    0 1 2
1 3 7.0 9.0
#Delete column containing NaN
0 2
1 3
2 4

Tutorials on Pandas


In this tutorial, we have covered the basics and common sense when working with Pandas. This is a library for very efficient data processing. Hope you can quickly grasp this knowledge.

If you are interested in other programming languages, you can visit our LearnShareIT website. Thanks for reading!

Leave a Reply

Your email address will not be published. Required fields are marked *