Learn practical skills, build real-world projects, and advance your career
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler

# KMeans
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Hierarchial Clustering
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import dendrogram
from scipy.cluster.hierarchy import cut_tree
sns.set_style('darkgrid')
dat = pd.read_csv('Country-data.csv')
dat.head()

Steps for assignment

  1. Data quality check
  2. EDA: Uni and bivariate analysis
  3. Outlier treatment
  4. Checking the cluster tendency: Hopkin's test
  5. Scaling
  6. Find best val of k : Silhouette score and SSD(Elbow)
  7. Using the final val of k perform the k mean analysis
  8. visualize the cluster with scatter
  9. Cluster profiling : GDPP, child_mort, income
  10. Hierarchicla clustering
  • Single linkage: dendrogram
  • Complete linkage: dendrogram
  • Use one of them for final clusters
  • Visualize scatterplot
  • Cluster profiling: GDPP, child_mort, income

Ultimately, using both the results from Kmeans and Hierarchial Clustering, identify the countries that are in the direst need of aid.

1.Data quality check