Saturday, July 29, 2023

Python Pandas Library

 Pandas

  • Pandas is a Python library used for working with data sets.
  • It has functions for analyzing, cleaning, exploring, manipulating data, and plotting data.
  • The name "Pandas" has a reference to both "Panel Data" and "Python Data Analysis" and was created by Wes McKinney in 2008.


Main Features

  1. Explorin Data
  2. Preprocessing / Cleaning Data 
  3. Analyzing Data 
  4. Visualization of Data

Coding Editor

  • Jupyter
  • Google Colab

Pandas Installaltion

pip install pandas
import pandas as pd

Exploring Data

1. Reading Data Frame
df = pd.read_csv('sinovac.csv')

2. head() and tail() function
 The head function will display the first rows, and the tail will be the last rows. 
By default, it shows 5 rows.

3. info()
To display data frames information we can use info() the method.

4. Display the number of rows and columns.
df.shape

5. Display column names
df. columns

6. display one column only
df['col name'].head(3)

7. display multiple  columns
df[['Age', 'Transaction_date', 'Gender']].head(4)


Data Cleaning

1. Delete Columns name
df.drop(['Transaction_ID'], axis=1, inplace=True)

2. Change the Columns name
df. rename(columns={"Transaction_date": "Date", "Gender": "Sex"},
inplace=True)

3.  Remove duplicate
# Display duplicated entries 
df. duplicated().sum()
# dropping ALL duplicate values
df.drop_duplicates(inplace = True)

4. Display missing values information
df.isna().sum().sort_values(ascending=False)
#Delete Nan rows of Job Columns
df.dropna(inplace=True)


Data Analysis

1. df. describe()
 This function will display most of the basic statistical measurements. 

2. df['State_names'].unique()
 Shows all unique values 

3.  df['Gender'].value_counts()
Counts of unique values
# Calculate the percentage of each category
df['Gender'].value_counts(normalize=True)
df['State_names'].value_counts().sort_values(ascending = False).head(20)

4. Sort Values by State_names
df.sort_values(by=['State_names']).head(3)
# Sort Values Amount_spent with descending order
df.sort_values(by=['Amount_spent'], ascending=False).head(3)

5.  nlargest() and nsmallest() functions
We can use nlargest() and nsmallest() functions for displaying largest and smallest values with desired numbers.
df.nlargest(4, 'Amount_spent').head(10)
df.nsmallest(3, 'Age').head(10) 

#6. filtering - Only show Paypal users
condition = df['Payment_method'] == 'PayPal'
df[condition].head(4)


Data Visualization

1 . Line plot
# use plot() method on the data frame
df.plot('year', 'price');


2. Bar plot
df['Employees_status'].value_counts().plot(kind='bar');
For vertical bar:
df['Employees_status'].value_counts().plot(kind='barh');

3 Pie plot
df['Segment'].value_counts().plot( kind='pie');








No comments:

Post a Comment

Linked List Implementation using C++ Structure

3 Nodes Linked List Implementaion Single Linked list #include<iostream> using namespace std;   struct Node{ int data; Node* next; Node...