Machine Learning Algorithms and Basic Implementation

Overview

Teaching: 130 min
Exercises: 130 min
Questions
  • What are the fundamental concepts in ML?

  • Which frame work is used in ML?

  • How do I train and test ML models?

Objectives
  • Gain an understanding of fundamental machine learning concepts.

  • Learn and apply best practices for training, evaluating, and interpreting machine learning models.

WHAT DO THE WORDS MEAN?

ai-ml-dl-gai

Artificial Intelligence: The ability of computing systems to achieve human-like performance on complex tasks.

Machine Learning Algorithms

Machine learning is a subset of artificial intelligence (AI) that focuses on developing computer systems capable of learning and improving from data without being explicitly programmed.

ml

Machine learning is widely used in various fields and applications such as: • Face Recognition
• Object Detection
• Chatbots
• Recommendation Systems
• Autonomous Vehicles
• Disease Diagnosis • Fraud detection

And many more it is nowadays being used in almost all sectors including economics, statistics, healthcare, agriculture, education, business, construction, astronomy, etc.

Types of Machine Learning

ML Types

Linear Models

Linear Regression

Assumptions:

How It Works:

Pros:

Cons:

Install Machine Learning Frameworks and Libraries

pip install xgboost
pip install lightgbm
pip install sklearn
pip install -U scikit-learn

Importing Libraries


# Import the NumPy library for numerical operations
import numpy as np

# Import the Pandas library for data manipulation and analysis
import pandas as pd

# Import Matplotlib for plotting and visualization
import matplotlib.pyplot as plt

# Import train_test_split from scikit-learn for splitting data into training and testing sets
from sklearn.model_selection import train_test_split

# Import various regression models from scikit-learn
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet  # Linear models
from sklearn.tree import DecisionTreeRegressor  # Decision tree model
from sklearn.neighbors import KNeighborsRegressor  # K-nearest neighbors model
from sklearn.svm import SVR  # Support vector regression model

# Import ensemble models from scikit-learn
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor, StackingRegressor

# Import XGBoost library for gradient boosting
import xgboost as xgb

# Import LightGBM library for gradient boosting
import lightgbm as lgb

# Import r2_score from scikit-learn for evaluating model performance
from sklearn.metrics import r2_score

# Import StandardScaler from scikit-learn for feature scaling
from sklearn.preprocessing import StandardScaler

# Interactive Widgets
import ipywidgets as widgets

Download data

gdp_data

Load the data from the csv file

# Load the data from the csv file
data = pd.read_csv('gdp_data.csv', index_col='Year')
data.head()

Plot the timeseries of the data

# Create a function to plot the time series
def plot_time_series(column):
    plt.figure(figsize=(10, 6))
    data[column].plot()
    plt.title(f'{column}')
    plt.xlabel('Date')
    plt.ylabel(column)
    plt.grid(True)
    plt.tight_layout()
    plt.show()

# Create a dropdown widget for selecting the column
column_selector = widgets.Dropdown(
    options=data.columns,
    description='Column:',
    disabled=False,
)

# Link the dropdown widget to the plot_time_series function
interactive_plot = widgets.interactive_output(plot_time_series, {'column': column_selector})

# Display the widget and the interactive plot
display(column_selector, interactive_plot)

Split data

X = data.drop("gdp", axis=1)
y = data["gdp"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, shuffle=False)

Define and Train the Linear Regression Model

lr = LinearRegression()
lr.fit(X_train, y_train)

Make predictions

predictions_lr = lr.predict(X_test)
predictions_lr

Evaluate the model

r2_lr = r2_score(y_test, predictions_lr)
print(f"  R^2: {r2_lr:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_lr, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Linear Regression')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Ridge Regression

Assumptions:

How It Works:

Pros:

Cons:

Define and train the Ridge Regression Model

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

Make predictions

predictions_ridge = ridge.predict(X_test)
predictions_ridge

Evaluate the model

r2_ridge = r2_score(y_test, predictions_ridge)
print(f"  R^2: {r2_ridge:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_ridge, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Ridge Regression')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Lasso Regression

Assumptions:

How It Works:

Pros:

Cons:

Define and train Lasso Regression Model

lasso = Lasso(alpha=1.0)
lasso.fit(X_train, y_train)

Make predictions

predictions_lasso = lasso.predict(X_test)
predictions_lasso

Evaluate the model

r2_lasso = r2_score(y_test, predictions_lasso)
print(f"  R^2: {r2_lasso =:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_lasso, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Lasso Regression')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Elastic Net Regression

Assumptions:

How It Works:

Pros:

Cons:

Define and train ElasticNet Regression Model

elastic = ElasticNet(alpha=1.0, l1_ratio=0.5)
elastic.fit(X_train, y_train)

Make predictions

predictions_elastic = elastic.predict(X_test)
predictions_elastic

Evaluate the model

r2_elastic = r2_score(y_test, predictions_elastic)
print(f"  R^2: {r2_elastic:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_elastic, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Elastic Net Regression')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Tree-Based Models

Decision Tree Regressor

Assumptions:

How It Works:

Pros:

Cons:

Define and Train Decision Tree Regressor

dt = DecisionTreeRegressor(max_depth=5, random_state=42)
dt.fit(X_train, y_train)

Make predictions

predictions_dt = dt.predict(X_test)
predictions_dt

Evaluate the model

r2_dt = r2_score(y_test, predictions_dt)
print(f"  R^2: {r2_dt:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_dt, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Decision Tree Regressor')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Instance-Based Learning

K-Nearest Neighbors (KNN) Regressor

Assumptions:

How It Works:

Pros:

Cons:

Define and Train K-Nearest Neighbors (KNN) Regressor

# Step 1: Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Step 2: Train the KNN model
knn = KNeighborsRegressor(n_neighbors=5)
knn.fit(X_train_scaled, y_train)

Make predictions

# Step 3: Standardize the test data
X_test_scaled = scaler.transform(X_test)

# Step 4: Make predictions
predictions_knn = knn.predict(X_test_scaled)
predictions_knn

Evaluate the model

r2_knn = r2_score(y_test, predictions_knn)
print(f"  R^2: {r2_knn:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_knn, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with K-Nearest Neighbors (KNN) Regressor')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Support Vector Regressor (SVR)

Assumptions:

How It Works:

Pros:

Cons:

Define and Train Support Vector Regressor (SVR)

# Step 1: Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Step 2: Train the SVR model
svr = SVR(C=1.0, epsilon=0.1, kernel='rbf')
svr.fit(X_train_scaled, y_train)

Make predictions

# Step 3: Standardize the test data
X_test_scaled = scaler.transform(X_test)

# Step 4: Make predictions
predictions_svr = svr.predict(X_test_scaled)
predictions_svr

Evaluate the model

r2_svr = r2_score(y_test, predictions_svr)
print(f"  R^2: {r2_svr:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_svr, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Support Vector Regressor (SVR)')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Boosting Methods

Gradient Boosting Machines (GBM)

Assumptions:

How It Works:

Pros:

Cons:

Define and Train Gradient Boosting Machines

gbr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gbr.fit(X_train, y_train)

Make predictions

# Make predictions
predictions_gbr = gbr.predict(X_test)
predictions_gbr

Evaluate the model

r2_gbr = r2_score(y_test, predictions_gbr)
print(f"  R^2: {r2_gbr:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_gbr, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Gradient Boosting Machines')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

XGBoost (Extreme Gradient Boosting)

Assumptions:

How It Works:

Pros:

Cons:

Define and Train Extreme Gradient Boosting

xgbr = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
xgbr.fit(X_train, y_train)

Make predictions

# Make predictions
predictions_xgbr = xgbr.predict(X_test_scaled)
predictions_xgbr

Evaluate the model

r2_xgbr = r2_score(y_test, predictions_xgbr)
print(f"  R^2: {r2_xgbr:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_xgbr, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Extreme Gradient Boosting')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

LightGBM (Light Gradient Boosting Machine)

Assumptions:

How It Works:

Pros:

Cons:

Define and Train Light Gradient Boosting Machine

lgb = lgb.LGBMRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
lgb.fit(X_train, y_train)

Make predictions

# Make predictions
predictions_lgb = lgb.predict(X_test)
predictions_lgb

Evaluate the model

r2_lgb = r2_score(y_test, predictions_lgb)
print(f"  R^2: {r2_lgb:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_lgb, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Light Gradient Boosting Machine')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Bagging

Random Forest Regressor

Assumptions:

How It Works:

Pros:

Cons:

Define and Train Random Forest Regressor

rf = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)
rf.fit(X_train, y_train)

Make predictions

# Make predictions
predictions_rf = rf.predict(X_test)
predictions_rf

Evaluate the model

r2_rf = r2_score(y_test, predictions_rf)
print(f"  R^2: {r2_rf:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_rf, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Random Forest Regressor')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Stacking (Stacked Regression)

Assumptions:

How It Works:

Pros:

Cons:

Define and Train Stacked Regression

base_models = [
    ('lr', LinearRegression()),
    ('rf', RandomForestRegressor(n_estimators=50, max_depth=5, random_state=42)),
    ('gbr', GradientBoostingRegressor(n_estimators=50, learning_rate=0.1, max_depth=3, random_state=42))
]

stack = StackingRegressor(estimators=base_models, final_estimator=Ridge())
stack.fit(X_train, y_train)

Make predictions

# Make predictions
predictions_stack = stack.predict(X_test)
predictions_stack

Evaluate the model

r2_stack = r2_score(y_test, predictions_stack)
print(f"  R^2: {r2_stack:.4f}")

Plot the predictions

plt.figure(figsize=(10, 6))
plt.plot(data.index, data.gdp, label='Actual GDP', marker='o', color='blue')
plt.plot(y_test.index, predictions_stack, label='Predicted GDP', linestyle='--', color='red', marker='o')
plt.title('GDP Prediction with Stacked Regression')
plt.xlabel('Year')
plt.ylabel('GDP')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Machine Learning Model Comparison

r2_scores = pd.DataFrame({
    'Model': ['Linear Regression', 'Ridge Regression', 'Lasso Regression', 'Elastic Net Regression',
              'Decision Tree Regressor', 'K-Nearest Neighbors Regressor', 'Support Vector Regressor',
              'Gradient Boosting Machines', 'XGBoost Regressor', 'Light Gradient Boosting Machine',
              'Random Forest Regressor', 'Stacked Regression'],
    'R^2 Score': [r2_lr, r2_ridge, r2_lasso, r2_elastic, r2_dt, r2_knn, r2_svr, r2_gbr, r2_xgbr, r2_lgb, r2_rf, r2_stack]
})

r2_scores.sort_values(by='R^2 Score', ascending=False)

Plot the R^2 scores

plt.figure(figsize=(12, 6))

plt.bar(r2_scores['Model'], r2_scores['R^2 Score'], color='skyblue')
plt.xlabel('R^2 Score')

plt.title('Model Comparison')
plt.xticks(rotation=90)
plt.show()

Key Points

  • ML algorithms like linear regression, k-nearest neighbors,support vector Machine, xgboost and random forests are vital algorithms

Copyright © 2024 UNECA-ACS

Contact