Learn practical skills, build real-world projects, and advance your career
Updated 3 years ago
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
# KMeans
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
# Hierarchial Clustering
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import dendrogram
from scipy.cluster.hierarchy import cut_tree
sns.set_style('darkgrid')
dat = pd.read_csv('Country-data.csv')
dat.head()
Steps for assignment
- Data quality check
- EDA: Uni and bivariate analysis
- Outlier treatment
- Checking the cluster tendency: Hopkin's test
- Scaling
- Find best val of k : Silhouette score and SSD(Elbow)
- Using the final val of k perform the k mean analysis
- visualize the cluster with scatter
- Cluster profiling : GDPP, child_mort, income
- Hierarchicla clustering
- Single linkage: dendrogram
- Complete linkage: dendrogram
- Use one of them for final clusters
- Visualize scatterplot
- Cluster profiling: GDPP, child_mort, income