Python Pandas Library for DataScience

Merging Data frames

pandas provide various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

 

Cost Item Purchased Name
Store 1 22.5 Sponge Chris
Store 1 2.5 Kitty Litter Kevyn
Store 2 5 Spoon Filip

 

 

Cost Item Purchased Name Date
Store 1 22.5 Sponge Chris 1-Dec
Store 1 2.5 Kitty Litter Kevyn 1-Jan
Store 2 5 Spoon Filip mid-May

 

 

Cost Item Purchased Name Date Delivered
Store 1 22.5 Sponge Chris 1-Dec TRUE
Store 1 2.5 Kitty Litter Kevyn 1-Jan TRUE
Store 2 5 Spoon Filip mid-May TRUE

 

 

Cost Item Purchased Name Date Delivered Feedback
Store 1 22.5 Sponge Chris 1-Dec TRUE Positive
Store 1 2.5 Kitty Litter Kevyn 1-Jan TRUE None
Store 2 5 Spoon Filip mid-May TRUE Negative

 

 

index Cost Item Purchased Name Date Delivered Feedback
0 Store 1 22.5 Sponge Chris 1-Dec TRUE Positive
1 Store 1 2.5 Kitty Litter Kevyn NaN TRUE None
2 Store 2 5 Spoon Filip mid-May TRUE Negative

 

Name                  Role
Kelly               Director of HR
Sally               Course liaison
James            Grader
Name                 School
James             Business
Mike               Law
Sally               Engineering

 

Role School
Name
James Grader Business
Kelly Director of HR NaN
Mike NaN Law
Sally Course liaison Engineering

 

 

Role School
Name
Sally Course liaison Engineering
James Grader Business

 

 

Role School
Name
Kelly Director of HR NaN
Sally Course liaison Engineering
James Grader Business

 

 

Role School
Name
James Grader Business
Mike NaN Law
Sally Course liaison Engineering

 

 

Name Role School
0 Kelly Director of HR NaN
1 Sally Course liaison Engineering
2 James Grader Business

 

 

Location_x Name Role Location_y School
0 State Street Kelly Director of HR NaN NaN
1 Washington Avenue Sally Course liasion 512 Wilson Crescent Engineering
2 Washington Avenue James Grader 1024 Billiard Avenue Business

 

 

First Name Last Name Role School
0 Sally Brooks Course liaison Engineering

 

Idiomatic Pandas: Making Code Pandorable

The output of the below code is not shared. Download the data and execute the below code to check the output.

 

 

 

 

 

 

Scales

 

Grades
excellent A+
excellent A
excellent A-
good B+
good B
good B-
ok C+
ok C
ok C-
poor D+
poor D

 

excellent                A+
excellent                A
excellent                A-
good                       B+
good                       B
Name: Grades, dtype: category
Categories (11, object): [A, A+, A-, B, …, C+, C-, D, D+]

excellent                 A+
excellent                 A
excellent                 A-
good                        B+
good                        B
Name: Grades, dtype: category
Categories (11, object): [D < D+ < C- < C … B+ < A- < A < A+]

excellent              True
excellent              True
excellent              True
good                     True
good                     True
good                     True
ok                          True
ok                          False
ok                          False
poor                      False
poor                      False
Name: Grades, dtype: bool

 

Pivot Tables

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
Download the data to perform below exercise

 

 

 

 

Date Functionality in Pandas

pandas have proven very successful as a tool for working with time series data, especially in the financial data analysis space. With the 0.8 release, It has further improved the time series API in pandas by leaps and bounds. Using the new NumPy datetime64 dtype

Timestamp

Timestamp(‘2016-09-01 10:05:00’)

Period

Period(‘2016-01’, ‘M’)

Period(‘2016-03-05’, ‘D’)

DatetimeIndex

2016-09-01      a
2016-09-02      b
2016-09-03      c
dtype: object

pandas.core.indexes.datetimes.DatetimeIndex

PeriodIndex

2016-09      d
2016-10       e
2016-11        f
Freq: M, dtype: object

pandas.core.indexes.period.PeriodIndex

Converting to DateTime

Below are some good references for Python Datatime conversion.

Converting a string into DateTime

 

a b
2-Jun-13 51 19
29-Aug-14 32 68
6/26/2015 59 18
7/12/2016 54 58

 

                            a        b
2013-06-02       51     19
2014-08-29       32    68
2015-06-26       59     18
2016-07-12        54     58

Timestamp(‘2012-07-04 00:00:00’)

Timedeltas

Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative

Timedelta(‘2 days 00:00:00’)

Timestamp(‘2016-09-14 11:10:00’)

Working with Dates in a Dataframe

pandas have proven very successful as a tool for working with time series data, especially in the financial data analysis space.

DatetimeIndex([‘2016-10-02’, ‘2016-10-16’, ‘2016-10-30’, ‘2016-11-13’,
‘2016-11-27’, ‘2016-12-11’, ‘2016-12-25’, ‘2017-01-08’,
‘2017-01-22′],
dtype=’datetime64[ns]’, freq=’2W-SUN’)

Count 1 Count 2
10/2/2016 107 116
10/16/2016 115 120
10/30/2016 110 125
11/13/2016 112 118
11/27/2016 111 117
12/11/2016 117 115
12/25/2016 113 121
1/8/2017 110 129
1/22/2017 117 120

 

Index([‘Sunday’, ‘Sunday’, ‘Sunday’, ‘Sunday’, ‘Sunday’, ‘Sunday’, ‘Sunday’,
‘Sunday’, ‘Sunday’],
dtype=’object’)

Count 1 Count 2
10/2/2016 NaN NaN
10/16/2016 8 4
10/30/2016 -5 5
11/13/2016 2 -7
11/27/2016 -1 -1
12/11/2016 6 -2
12/25/2016 -4 6
1/8/2017 -3 8
1/22/2017 7 -9

 

Count 1 Count 2
10/31/2016 110.666667 120.333333
11/30/2016 111.5 117.5
12/31/2016 115 118
1/31/2017 113.5 124.5

 

Count 1 Count 2
1/8/2017 110 129
1/22/2017 117 120

 

Count 1 Count 2
12/11/2016 117 115
12/25/2016 113 121

 

Count 1 Count 2
12/11/2016 117 115
12/25/2016 113 121
1/8/2017 110 129
1/22/2017 117 120

 

Check The Result

For Complete Article on Visualization Refer:

You might also like More from author