Python Core Libraries

Overview

Teaching: 130 min
Exercises: 130 min
Questions
  • What are the basics of Numpy?

  • What are the basics of Pandas?

  • What are the basics of Matplotlib?

Objectives
  • Learn the basics of Numpy.

  • Learn the basics of Pandas

  • Learn the basics of Matplotlib.

Python Core Libraries


NumPy

Agenda

Introduction to NumPy

Import the NumPy library

import numpy as np

# Check the version of NumPy
print(np.__version__)

Creating NumPy Arrays

Creating Arrays from Lists
Create a 1D array
arr = np.array([1, 2, 3, 4])
print(arr)
Create a 2D array
arr_2d = np.array([[1, 2], [3, 4]])
print(arr_2d)
Create a 3D Arrays
c = np.array([[[1,1,2], [2,3,3]], [[1,1,1], [1,1,1]], [[1,1,1], [1,1,1]]])
Creating Arrays with NumPy Functions
Create an array of zeros
zeros = np.zeros((2, 3))  # 2 rows, 3 columns
print(zeros)
Create an array of ones
ones = np.ones((3, 2))  # 3 rows, 2 columns
print(ones)
Random numbers
a = np.random.rand(4)       # uniform in [0, 1]

b = np.random.randn(4)      # Gaussian
Create an array with a range of numbers
range_arr = np.arange(0, 10, 2)  # Start at 0, stop before 10, step by 2
print(range_arr)
Create an array with evenly spaced numbers
linspace_arr = np.linspace(0, 1, 5)  # Start at 0, stop at 1, 5 evenly spaced values
print(linspace_arr)

Basic Array Operations

Mathematical Operations
Perform basic math operations
arr = np.array([1, 2, 3, 4])
print(arr + 2)  # Add 2 to each element
print(arr * 3)  # Multiply each element by 3
Perform element-wise operations with another array
arr2 = np.array([5, 6, 7, 8])
print(arr + arr2)  # Element-wise addition
print(arr * arr2)  # Element-wise multiplication
Array Properties
Get properties of an array
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)  # Shape of the array (rows, columns)
print(arr.size)   # Total number of elements
print(arr.dtype)  # Data type of elements

Indexing and Slicing

Indexing
Access elements of a 1D array
arr = np.array([10, 20, 30, 40])
print(arr[0])  # First element
print(arr[-1])  # Last element
Access elements of a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d[0, 1])  # Element in first row, second column
print(arr_2d[1, 2])  # Element in second row, third column
Slicing
Slice a 1D array
arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4])  # Elements from index 1 to 3
print(arr[:3])   # First three elements
Slice a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d[:2, 1:])  # First two rows, columns 2 and 3
print(arr_2d[1:, :2])  # Last two rows, first two columns

Reshaping Arrays

Changing the Shape of an Array
Reshape a 1D array to a 2D array
arr = np.arange(1, 10)
reshaped = arr.reshape(3, 3)  # 3 rows, 3 columns
print(reshaped)
Flatten a 2D array back to 1D
flattened = reshaped.flatten()
print(flattened)

Broadcasting

Add a scalar to a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d + 10)
Add a 1D array to a 2D array
row = np.array([1, 2, 3])
print(arr_2d + row)

Asking For Help

np.add?
np.array?
np.arr*?
np.con*?

Key Points:


Pandas

Agenda

Introduction to Pandas

Import the Pandas library

import pandas as pd

# Check the version of Pandas
print(pd.__version__)

Pandas Series

# Create a Series from a list
data = [10, 20, 30, 40]
series = pd.Series(data, index=["A", "B", "C", "D"])
print(series)

# Accessing elements in a Series
print(series["B"])  # Output: 20

Pandas DataFrame

Create a DataFrame from a dictionary

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.DataFrame(data)
print(df)

Viewing DataFrame

df.head()
df.tail()

Display DataFrame information

df.info()

Display the data types

df.dtypes

Display statistical summary of DataFrame

df.describe()

Display the DataFrame index and columns

Get the DataFrame index
df.index
Get the DataFrame columns
df.columns

Accessing a DataFrame

# Accessing a column
print(df["Name"])

# Accessing rows using loc
print(df.loc[1])  

# Accessing rows using iloc
print(df.iloc[2]) 

Adding New Columns

df['Experience'] = np.random.randint(1, 5, size=len(df))

df.head()

Dropping Columns

# Dropping a Single Column
df_dropped = df.drop("City", axis=1)
print(df_dropped)

# Alternatively, drop in place (modifies the original DataFrame)
df.drop("City", axis=1, inplace=True)
print(df)

# Dropping Multiple Columns
df = pd.DataFrame(data)  # Recreate the original DataFrame
df_dropped = df.drop(["Age", "City"], axis=1)
print(df_dropped)

Dropping Rows

# Drop the row with index 1
df_dropped = df.drop(1, axis=0)
print(df_dropped)

# Alternatively, drop in place
df.drop(1, axis=0, inplace=True)
print(df)
# Drop multiple rows
df = pd.DataFrame(data)  # Recreate the original DataFrame
df_dropped = df.drop([0, 2], axis=0)
print(df_dropped)

Reading and Writing Data

GDP data

Read data from a CSV file

df = pd.read_csv("example.csv")
print(df)

Write data to a CSV file

df.to_csv("output.csv", index=False)

Reading Excel Files

# Read data from an Excel file
df = pd.read_excel("example.xlsx", sheet_name="Sheet1")  # Specify the sheet name

# Read the first 5 rows (head) of the DataFrame
print(df.head())

Writing Excel Files

# Save DataFrame to an Excel file
df.to_excel("output.xlsx", index=False)  # Save without including the index
Writing Multiple Sheets
# Create multiple DataFrames
df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df2 = pd.DataFrame({"X": [7, 8, 9], "Y": [10, 11, 12]})

# Write to multiple sheets
with pd.ExcelWriter("multi_sheet_output.xlsx") as writer:
    df1.to_excel(writer, sheet_name="Sheet1", index=False)
    df2.to_excel(writer, sheet_name="Sheet2", index=False)

Reading JSON Files

# Read data from a JSON file
df = pd.read_json("example.json")

# Display the first 5 rows
print(df.head())

Writing JSON Files

# Save DataFrame in a column-oriented JSON format
df.to_json("column_oriented.json", orient="columns")

Data Manipulation

Filtering

# Filter rows where Age > 25
filtered_df = df[df["Age"] > 25]
print(filtered_df)
# Filter rows where Age > 25 and City is "Chicago"
filtered_df = df[(df["Age"] > 25) & (df["City"] == "Chicago")]
print(filtered_df)

Sort by Age in descending order

sorted_df = df.sort_values("Age", ascending=False)
print(sorted_df)

Aggregations

# Calculate the mean age
mean_age = df["Age"].mean()
print(f"Mean Age: {mean_age}")

Selecting Columns and Rows in Pandas

Selecting Columns

Selecting a Single Column
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)

# Select a single column
age_column = df["Age"]
print(age_column)

# Alternatively, use dot notation
city_column = df.City
print(city_column)
Selecting Multiple Columns
subset = df[["Name", "City"]]
print(subset)

Renaming Columns

df.rename(columns={"Age": "Years"}, inplace=True)
print(df)

Selecting Rows

Selecting Rows by Index Label (.loc[])

# Select a single row by index label
row_1 = df.loc[1]
print(row_1)

# Select multiple rows by index labels
rows = df.loc[[0, 2]]
print(rows)

Selecting Rows by Index Position (.iloc[])

# Select a single row by index position
row_0 = df.iloc[0]
print(row_0)

# Select multiple rows by index positions
rows = df.iloc[[0, 2]]
print(rows)

Selecting Specific Rows and Columns

Using .loc[] for Specific Rows and Columns
# Select rows 0 and 2, and columns "Name" and "City"
subset = df.loc[[0, 2], ["Name", "City"]]
print(subset)
Using .iloc[] for Specific Rows and Columns
# Select rows 0 and 2, and columns at positions 0 and 2
subset = df.iloc[[0, 2], [0, 2]]
print(subset)

Handling Missing Data

Handling Missing Data

Create a DataFrame with missing values

data = {
    "Name": ["Alice", "Bob", None],
    "Age": [25, None, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)

Check for missing values

print(df.isna())

Fill missing values

filled_df = df.fillna("Unknown")
print(filled_df)

Drop rows with missing values

cleaned_df = df.dropna()
print(cleaned_df)

Summary of Pandas Selection Methods

Selection Type Syntax Example Notes
Single Column df["column_name"] Access column as a Series.
Multiple Columns df[["col1", "col2"]] Returns a new DataFrame.
Single Row (label) df.loc[label] Label-based row selection.
Single Row (position) df.iloc[position] Position-based row selection.
Specific Rows and Columns df.loc[[rows], [columns]] Label-based row/column selection.
Filtering Rows df[condition] Filter rows based on conditions.

Matplotlib

Agenda

Introduction to Matplotlib

Importing Matplotlib

# Import Matplotlib's pyplot module
import matplotlib.pyplot as plt

# Check the Matplotlib version
print(plt.__version__)

Line Plot

Creating a Simple Line Plot
# Data for the plot
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

# Create a line plot
plt.plot(x, y)
plt.title("Simple Line Plot")  # Add a title
plt.xlabel("X-axis")           # Add an X-axis label
plt.ylabel("Y-axis")           # Add a Y-axis label
plt.show()                     # Display the plot

Line Plot one

Customizing the Line Plot
# Customize line style, color, and markers
plt.plot(x, y, color="red", linestyle="--", marker="o", label="Data")
plt.title("Customized Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()  # Add a legend
plt.grid(True)  # Add grid lines
plt.show()

Line Plot two

Bar Chart

Creating a Bar Chart
# Data for the bar chart
categories = ["A", "B", "C", "D"]
values = [5, 7, 8, 6]

# Create a bar chart
plt.bar(categories, values, color="blue")
plt.title("Bar Chart")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

Bar Plot one

Horizontal Bar Chart
# Create a horizontal bar chart
plt.barh(categories, values, color="green")
plt.title("Horizontal Bar Chart")
plt.xlabel("Values")
plt.ylabel("Categories")
plt.show()

Bar Plot two

Scatter Plot

Creating a Scatter Plot
# Data for the scatter plot
x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]

# Create a scatter plot
plt.scatter(x, y, color="purple", marker="x")
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

Scatter Plot two

Histogram

Creating a Histogram
# Data for the histogram
data = [22, 87, 5, 43, 56, 73, 55, 54, 11, 20, 51, 5, 79, 31, 27]

# Create a histogram
plt.hist(data, bins=5, color="orange", edgecolor="black")
plt.title("Histogram")
plt.xlabel("Data")
plt.ylabel("Frequency")
plt.show()

Histogram Plot two

Pie Chart

Creating a Pie Chart
# Data for the pie chart
sizes = [30, 20, 35, 15]
labels = ["Apples", "Bananas", "Cherries", "Dates"]

# Create a pie chart
plt.pie(sizes, labels=labels, autopct="%1.1f%%", startangle=90)
plt.title("Pie Chart")
plt.show()

Pie Plot two

Subplots

Creating Simple Subplots
import matplotlib.pyplot as plt
import numpy as np

# Data for plots
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Create subplot
plt.subplot(2, 1, 1)  # 2 rows, 1 column, 1st subplot
plt.plot(x, y1, label="sin(x)", color="blue")
plt.title("Sine Wave")
plt.legend()

plt.subplot(2, 1, 2)  # 2 rows, 1 column, 2nd subplot
plt.plot(x, y2, label="cos(x)", color="red")
plt.title("Cosine Wave")
plt.legend()

plt.tight_layout()  # Adjust layout to avoid overlapping
plt.show()

Subplots Plot two

Using plt.subplots() for More Control
# Create a figure with multiple subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Sine wave
axes[0, 0].plot(x, y1, color="blue")
axes[0, 0].set_title("Sine Wave")

# Cosine wave
axes[0, 1].plot(x, y2, color="red")
axes[0, 1].set_title("Cosine Wave")

# Exponential
y3 = np.exp(x / 10)
axes[1, 0].plot(x, y3, color="green")
axes[1, 0].set_title("Exponential Function")

# Logarithmic
y4 = np.log(x + 1)
axes[1, 1].plot(x, y4, color="purple")
axes[1, 1].set_title("Logarithmic Function")

# Add overall titles and adjust layout
fig.suptitle("Multiple Plots in One Figure", fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.95])  # Adjust space for the title
plt.show()

Subplots Plot two

Generate and plot synthetic GDP data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic GDP data
np.random.seed(42)  # For reproducibility
years = np.arange(2000, 2023)  # 2000 to 2022
gdp = np.cumsum(np.random.uniform(0.5, 5, len(years))) + 100  # Cumulative GDP growth starting at 100

# Create a DataFrame
gdp_data = pd.DataFrame({"Year": years, "GDP": gdp})

# Plot the data
plt.figure(figsize=(10, 6))
plt.plot(gdp_data["Year"], gdp_data["GDP"], marker="o", linestyle="-", color="blue", label="GDP Growth")

# Adding plot details
plt.title("Annual GDP Growth (2000-2022)", fontsize=16)
plt.xlabel("Year", fontsize=12)
plt.ylabel("GDP (in Billion $)", fontsize=12)
plt.grid(True, linestyle="--", alpha=0.6)
plt.legend(fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()

# Show the plot
plt.show()

Display the GDP Plot

GDP Plot

3D Plots (Optional)

3D Line Plot
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

# Data for a 3D line
z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)

# Create a 3D plot
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d")

ax.plot(x, y, z, label="3D Spiral", color="blue")
ax.set_title("3D Line Plot")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis")
ax.legend()

plt.show()

3D Line Plot

3D Surface Plot
# Create data for a surface plot
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))

# Create a 3D surface plot
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d")

surf = ax.plot_surface(x, y, z, cmap="viridis")
fig.colorbar(surf)  # Add a color bar for reference
ax.set_title("3D Surface Plot")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis")

plt.show()

3D Surface Plot

3D Scatter Plot
# Create random 3D data
np.random.seed(42)
x = np.random.rand(50)
y = np.random.rand(50)
z = np.random.rand(50)

# Create a 3D scatter plot
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d")

ax.scatter(x, y, z, color="red", marker="o", label="Data Points")
ax.set_title("3D Scatter Plot")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis")
ax.legend()

plt.show()

3D Surface Plot

Saving a Plot

Saving a Plot as an Image
# Create a plot
plt.plot(x, y)
plt.title("Line Plot")

# Save the plot as a PNG file
plt.savefig("line_plot.png")

# Save the plot as a high-quality image
plt.savefig("line_plot_high_res.png", dpi=300)

# Display the plot
plt.show()

Key Points:


Exercise

Numpy

1) Create a 1D array of integers from 5 to 15.
2) Create a 2D array of shape (3, 3) filled with ones.
3) Multiply all elements in the array [2, 4, 6, 8] by 3.
4) Add the arrays [1, 2, 3] and [4, 5, 6].
5) Access the last row of the array [[1, 2, 3], [4, 5, 6], [7, 8, 9]].
6) Slice the array np.arange(10) to get only even numbers.
7) Reshape the array np.arange(12) into a (3, 4) array.
8) Flatten the reshaped array back into 1D.

Pandas

1) Create a Pandas Series with the following data: [100, 200, 300]. Assign the indices as [“X”, “Y”, “Z”].
2) Create a DataFrame with columns Product, Price, and Stock using the following data:

Products: [“Laptop”, “Tablet”, “Smartphone”] Prices: [1000, 500, 800] Stock: [50, 100, 200] Filtering and Sorting

3) Filter rows in the DataFrame where Price > 600.
4) Sort the DataFrame by Stock in ascending order. 5) Create a DataFrame with some missing values and write code to:

Matplotlib

1) Create a line plot for x = [1, 2, 3, 4, 5] and y = [10, 20, 25, 30, 35]. Customize the plot by changing the line color, adding markers, and a legend.
2) Create a bar chart for categories [“Math”, “Science”, “History”, “English”] and values [85, 90, 75, 80]. Add a title, and X and Y axis labels.
3) Plot x = [10, 20, 30, 40, 50] and y = [5, 15, 25, 35, 45] as a scatter plot. Change the marker style and color.
4) Create a histogram for the dataset [5, 7, 8, 9, 5, 3, 4, 5, 7, 8, 9, 10] with 4 bins.
5) Create a pie chart for sizes = [40, 30, 20, 10] with labels [“A”, “B”, “C”, “D”]
6) Import your annual GDP data into a Pandas DataFrame and visualize the data using Matplotlib.

Key Points

  • NumPy provides fast, efficient, and flexible array operations.

  • Pandas is a powerful library for data manipulation and analysis.

  • Matplotlib is a python 2D and 3D plotting library which produces scientific figures and publication quality figures.

Copyright © 2024 UNECA-ACS

Contact