Applied Data Science with Python – Part 2

This article will introduce you to the basics of the Python programming environment and applied data science with python, including fundamental python programming techniques such as lambdas, reading and manipulating CSV files, and the numpy library.

The Series Data Structure

We will quickly start with, a non-comprehensive overview of the fundamental data structures in pandas. The fundamental behaviour about data types, indexing, and axis labelling/alignment apply across all of the objects. To get started, import numpy and load pandas into your namespace

We’ll create variable animals of string type and convert it into pandas series.

We’ll create pandas series of Integer type

Import numpy
To construct a DataFrame with missing data, use fornp.nan those values which are missing. Alternatively, you may pass annumpy.MaskedArray as the data argument to the DataFrame constructor, and its masked entries will be considered missing

NaN (not a number) is the standard missing data marker used in pandas

To make detecting missing values easier (and across different array dtypes), pandas provides the isna() and notna()functions, which are also methods on Series and DataFrame objects

 

 

 

 

Querying a Series

Indexing In Python

‘South Korea’

‘Scotland’

‘South Korea’

‘Scotland’

0      100.0
1       120.0
2      101.0
3      3.0
dtype: float64

324.0

324.0

0    486
1     951
2    111
3    142
4    457
dtype: int64

10000

100 loops, best of 3: 1.31 ms per loop

100 loops, best of 3: 74.3 µs per loop

0     488
1      953
2     113
3     144
4     459
dtype: int64

0    490
1    955
2    115
3    146
4    461
dtype: int64

10 loops, best of 3: 966 ms per loop

10 loops, best of 3: 317 µs per loop

0               1
1               2
2               3
Animal   Bears
dtype:     object

 

Archery            Bhutan
Golf                  Scotland
Sumo               Japan
Taekwondo      South Korea
dtype:              object

Cricket           Australia
Cricket           Barbados
Cricket           Pakistan
Cricket           England
dtype:       object

Archery          Bhutan
Golf                 Scotland
Sumo              Japan
Taekwondo   South Korea
Cricket           Australia
Cricket           Barbados
Cricket           Pakistan
Cricket           England
dtype: object

Cricket      Australia
Cricket      Barbados
Cricket      Pakistan
Cricket      England
dtype:   object

The DataFrame Data Structure

DataFrame is a 2-dimensional labelled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. Below is some example covering different scenarios.

Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 1 2.5 Kitty Litter Kevyn
Store 2 5 Bird Seed Vinod

Cost                             5
Item Purchased           Bird Seed
Name                          Vinod
Name: Store 2, dtype: object

pandas.core.series.Series

Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 1 2.5 Kitty Litter Kevyn

Store 1       22.5
Store 1       2.5
Name: Cost, dtype: float64
DataFrame.T is used to transpose the DataFrame

Store 1 Store 1 Store 2
Cost 22.5 2.5 5
Item Purchased Dog Food Kitty Litter Bird Seed
Name Chris Kevyn Vinod

Store 1      22.5
Store 1      2.5
Store 2      5
Name: Cost, dtype: object

Store 1       22.5
Store 1       2.5
Store 2       5.0
Name: Cost, dtype: float64

Store 1       22.5
Store 1       2.5
Name: Cost, dtype: float64

Name Cost
Store 1 Chris 22.5
Store 1 Kevyn 2.5
Store 2 Vinod 5

Cost Item Purchased Name
Store 2 5 Bird Seed Vinod

Cost Item Purchased Name
Store 1 22.5 Dog Food Chris
Store 1 2.5 Kitty Litter Kevyn
Store 2 5 Bird Seed Vinod

Cost Item Purchased Name
Store 2 5 Bird Seed Vinod

Cost Item Purchased
Store 2 5 Bird Seed

Cost Item Purchased Name Location
Store 1 22.5 Dog Food Chris None
Store 1 2.5 Kitty Litter Kevyn None
Store 2 5 Bird Seed Vinod None

Dataframe Indexing and Loading

Cost Item Purchased Name Location
Store 1 24.5 Dog Food Chris None
Store 1 4.5 Kitty Litter Kevyn None
Store 2 7 Bird Seed Vinod None

 

Download the Dataset and Run the below commands to check the output
Data Set Used: Download 

 

 

 

Querying a DataFrame

 

 

 

 

 

 

 

Indexing DataFrames

 

 

Download the Data

 

 

 

 

 

 

Missing Values

In this section, we will discuss missing (also referred to as NA) values in pandas
Download the Data

 

 

 

 

You might also like More from author