Jovian
⭐️
Sign In

Heart Disease Prediction using KNN, Logical Regresion and Decision Tree

Part A & B

In [1]:
import pandas as pd
In [2]:
df = pd.read_csv("https://github.com/mpourhoma/CS4661/raw/master/Heart_s.csv")
In [3]:
df.head()
Out[3]:

Part C

Feature Matrix
In [4]:
feature_columns =['Age','RestBP', 'Chol', 'RestECG', 'MaxHR','Oldpeak']
In [5]:
cols = df[feature_columns]
ahd = df['AHD']

Part D

In [6]:
from sklearn.model_selection import train_test_split as t_t_s
In [7]:
x_train, x_test, y_train, y_test = t_t_s(cols, ahd, test_size = 0.25, random_state = 6)

Part E

In [8]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
KNN Classifier
In [9]:
k = 3
KNN = KNeighborsClassifier(n_neighbors = k)
KNN.fit(x_train, y_train)
y_predict = KNN.predict(x_test)
accuracy = accuracy_score(y_test, y_predict)

print(accuracy)
0.6447368421052632
Decision Tree Classifier
In [10]:
d_tree = DecisionTreeClassifier (random_state = 5)
d_tree.fit(x_train, y_train)
y_predict = d_tree.predict(x_test)
accuracy = accuracy_score(y_test, y_predict)

print(accuracy)
0.618421052631579
Logistic Regression
In [11]:
# x_train,x_test, y_train, y_test = t_t_s(cols, ahd, test_size = 0.25, random_state = 6)
logistic_reg = LogisticRegression(solver='lbfgs')
logistic_reg.fit(x_train, y_train)
y_predict = logistic_reg.predict(x_test)
accuracy = accuracy_score(y_test, y_predict)

print(accuracy)
0.6710526315789473
Logistic Regression seems to be better as the acuracy is much higher but also very close to KNN. Where as, Decission Tree has a low accuracy of all three.

Part F

In [12]:
# creating a new DF

new_df = pd.get_dummies(df, columns = ["Gender", "ChestPain", "Thal"])

Part G

In [13]:
# Adding new features
features_Kept = ['Age','RestBP','Chol','RestECG','MaxHR','Oldpeak',
                 'Gender_f','Gender_m',
                 'ChestPain_asymptomatic','ChestPain_nonanginal','ChestPain_nontypical','ChestPain_typical',
                 'Thal_fixed','Thal_normal','Thal_reversable']
nX = new_df[features_Kept]
ny = new_df['AHD']
nX_train, nX_test, ny_train, ny_test = t_t_s(nX, ny, test_size=0.25, random_state=6)
KNN Classifier
In [14]:
# k = 3
# KNN = KNeighborsClassfier(n_neighbors = k)
KNN.fit(nX_train, ny_train)
ny_predict = KNN.predict(nX_test)
accuracy = accuracy_score(ny_test, ny_predict)

print(accuracy)
0.6447368421052632
Decision Tree Classifier
In [15]:
# d_tree = DecisionTreeClassifier (random_state = 5)
d_tree.fit(nX_train, ny_train)
ny_predict = d_tree.predict(nX_test)
accuracy = accuracy_score(ny_test, ny_predict)

print(accuracy)
0.7368421052631579
Logistic Regression
In [16]:
# logistic_reg = LogisticRegression()
logistic_reg.fit(nX_train, ny_train)
ny_predict = logistic_reg.predict(nX_test)
accuracy = accuracy_score(ny_test, ny_predict)

print(accuracy)
0.7763157894736842
/Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)

Part H

In [17]:
from sklearn.model_selection import cross_val_score
Applying 10-fold Cross Validation for logistic regression classifier
In [18]:
accuracy = cross_val_score(logistic_reg, nX, ny, cv = 10, scoring = 'accuracy')

print(accuracy)
print("------------------------------------------------------------------------------------------------------------")
print("We use the mean accuracy for Cross Validating")
print("------------------------------------------------------------------------------------------------------------")
accuracy_cv = accuracy.mean()
log_reg_accuracy = accuracy_cv
print("Mean accuracy: " + str(accuracy_cv))
/Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)
[0.77419355 0.77419355 0.80645161 0.93333333 0.93333333 0.63333333 0.8 0.83333333 0.82758621 0.82758621] ------------------------------------------------------------------------------------------------------------ We use the mean accuracy for Cross Validating ------------------------------------------------------------------------------------------------------------ Mean accuracy: 0.8143344456803856
/Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning) /Users/Mayank/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)
Applying 10-fold Cross Validation for KNN classifier
In [19]:
accuracy = cross_val_score(KNN, nX, ny, cv=10)

print(accuracy)
print("------------------------------------------------------------------------------------------------------------")
print("We use the mean accuracy for Cross Validating")
print("------------------------------------------------------------------------------------------------------------")
accuracy_cv = accuracy.mean()
print("Mean accuracy: " + str(accuracy_cv))
[0.70967742 0.64516129 0.51612903 0.66666667 0.63333333 0.46666667 0.66666667 0.66666667 0.55172414 0.72413793] ------------------------------------------------------------------------------------------------------------ We use the mean accuracy for Cross Validating ------------------------------------------------------------------------------------------------------------ Mean accuracy: 0.6246829810901001
Applying 10-fold Cross Validation for decision tree classifier
In [20]:
accuracy = cross_val_score(d_tree, nX, ny, cv=10)
print(accuracy)
print("------------------------------------------------------------------------------------------------------------")
print("We use the mean accuracy for Cross Validating")
print("------------------------------------------------------------------------------------------------------------")
accuracy_cv = accuracy.mean()
print("Mean accuracy: " + str(accuracy_cv))
[0.77419355 0.77419355 0.77419355 0.76666667 0.8 0.63333333 0.63333333 0.63333333 0.68965517 0.82758621] ------------------------------------------------------------------------------------------------------------ We use the mean accuracy for Cross Validating ------------------------------------------------------------------------------------------------------------ Mean accuracy: 0.7306488691138302

The best accuracy is obtained by Logistic Regession using 10 fold Cross Validation:

In [21]:
print(log_reg_accuracy)
0.8143344456803856