Python – Introduction to the Pandas Library

Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas is the most popular python library that is used for data analysis.

Key Features of Pandas:

Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionality.

Install and import Pandas:

If you install Anaconda Python package, Pandas will be installed by default.

Also you can install through your terminal window either of the following commands:

conda install pandas

Or via PyPI:

pip install pandas

Import pandas library:

# Import Panda Library 
import pandas as pd

Data analysis with Pandas:

We can analyze data in pandas with following two primary components:

Series
DataFrame

Series:

Series is one-dimensional (1-D) labeled array capable of holding any data type.

Creating series with basic data collection and index:

# Numeric data collection
Data =[1, 3, 4, 5, 6, 2, 9]

# Creating series with default index values 
sr1 = pd.Series(Data)  

print("Series with default Index:")
print(sr1)

# Predefined index values 
Index =['a', 'b', 'c', 'd', 'e', 'f', 'g']

# Creating series with predefined index values 
sr2 = pd.Series(Data, Index)

print("Series with predefined Index:")
print(sr2)

Output:

Creating series from Numpy array:

# Import Numpy Library 
import numpy as np

# Numpy array data collection
data = np.array(['abc','xyz','www','com'])

# Predefined index values 
index=[1001,1002,1003,1004]

# Creating series with numpy array data and predefined index values.
sr3 = pd.Series(data,index)

print("Series with numpy array data and predefined index:")
print(sr3)

Output:

Retrieving data from Series object:

#retrieve the first element.
print('First element: ', sr1[0])

#retrieve the first three element.
print('First three elements: ', sr1[:3])

#retrieve a single element.
print('Single element: ', sr2['a'])

#retrieve multiple elements.
print('Multiple elements: ', sr2[['a','c','e']])

Output:

DataFrame:

DataFrame is two-dimensional (2-D) data structure defined in pandas which consists of rows and columns.

In general, Pandas DataFrame consists of three main components: the data, the index, and the columns.

Creating DataFrame from simple data collection

Let’s create a data structure for employee records. Following table represent the employee data.

Creating DataFrame based on above employee table.

# Employee data collection.
data = {
    'EID': [1001, 1002, 1003, 1004, 1005], 
    'Name': ['Dhoni', 'Virat', 'Sachin', 'Laxman', 'Saurav'],
    'Salary': [45000, 55000, 60000, 35000, 40000]
}

# Create DataFrame based on the employee data collection with index. 
empDataFrame = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4', 'rank5'])

empDataFrame

Output:

Reading data from CSV file

Creating DataFrame from CSV file. Let’s fetch Nifty Index price sheet for month of June and draw a price chart.

Date Source: CSV file.

Reading CSV file into DataFrame.

import pandas as pd

# Input file name.
inputFilename = r'''F:\PURNA\AI_ML\Python\NumpyPandas\NiftyPriceSheetJune2019.csv'''

# Read the CSV file through read_csv() function
dframe = pd.read_csv(inputFilename) 

dframe

Output:

Creating Stock price line chart from DataFrame above.

# Import Matplotlib Library 
from matplotlib import pyplot as plt

plt.style.use('ggplot')

# plot with matplotlib
plt.plot( 'Date', 'Price', data=dframe, marker='o', color='mediumvioletred')
plt.ylabel('Price')
plt.xlabel('Date')

plt.title('Nifty Index Price, June 2019', fontsize='18')
plt.legend(['Nifty, Jun 2019'], loc=1)

plt.show()

Output:

To be continued..

Reference Source:

One response to “Python – Introduction to the Pandas Library”

Exploring The Power of Data Frame in Pandas – AI Decades

August 3, 2019

[…] covered a lot on basics of pandas in Python – Introduction to the Pandas Library, please read that article before start exploring this […]

LikeLike

Purnananda Behera