Learn data science and machine learning by building real-world projects on Jovian

Project Title - Cardiac Patients Medical Report Analysis

istockphoto-1177145926-612x612.jpg image source: istockphoto.com


Heart attack is one of the common cause of premature death all over the world. In India alone more than 3 million people die due to heart attack and heart stroke annually. There are many facts that affect the health and efficiency of heart. With proper knowledge and precautions, the challenge and complication can be managed.

In this project medical report of several heart patients been analyzed. With the exploratory analysis, some facts can be checked and the processed dataset can also be used in ML models to predict chances of heart attack in new patients.

The vital part of Data Analysis is understanding the dataset. Asking relevant questions is the primary key to good analytics. So, here we will understand the dataset by going through the columns thoroughly.


Analysis and visualization of the data will be done in four steps

  1. Interaction: Will have a closer look at the dataset and it's components.
  2. Prepare: Cleaning and preparation of data for visualization
  3. Analyze/Ask : In this step, relevant questions will be asked
  4. Visualization: Will find the answers to the Analyze/Ask section.

  • The dataset is available at Kaggle.
  • This project work is a part of an excellent course on Data Analysis with Python: Zero to Pandas offered by Jovian. The course focuses on beginner to advanced level data analysis using python libraries like Pandas, Numpy, Matplotlib, and Seaborn.

Import requried libraries

!pip install jovian opendatasets --upgrade --quiet
# Import required libraries

import matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import opendatasets as od
import os
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Let us save and upload our work to Jovian before continuing.

Download the Dataset


The dataset contains admitted patients data with following columns:

  • age : Age in years
  • sex : Gender

1: Male, 0: Female

  • cp : Chest pain type

0: Typical angina, 1: Atypical angina, 2: Non-angina pain, 3: Asymptomatic

  • trestbps : Resting blood pressure (mmHg) on admission to hospital
  • chol : Serum cholesterol (mg/dl)
  • fbs : Fasting blood sugar > 120 mg/dl

1: True, 0: False

  • restecg : Resting electrocardiographic results

0: Normal, 1: Having ST-T wave abnormality, 2: Probable or definite left ventricular hypertrophy

  • thalach : Maximum heart rate achieved
  • exang : Exercise induced angina

1: True, 0: False

  • oldpeak : ST depression induced by exercise relative to rest
  • slope : Slope of the peak exercise ST segment

1: Upsloping, 2: Flat, 3: Downsloping

  • ca : Number of major vessels 0-3 colored by fluoroscopy
  • thal : Thalassemia

1: Normal, 2: Fixed defect, 3: Reversible defect

  • target : Chances of heart attack

0: Less, 1: More

# url link to the dataset

dataset_url = 'https://www.kaggle.com/johnsmith88/heart-disease-dataset' 

od.download(dataset_url)
print('The dataset has been downloaded and extracted.')
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds Your Kaggle username: ashrulochansahoo Your Kaggle Key: ········ Downloading heart-disease-dataset.zip to ./heart-disease-dataset
100%|██████████| 6.18k/6.18k [00:00<00:00, 4.77MB/s]
The dataset has been downloaded and extracted.
# Directory of the data 
data_dir = './heart-disease-dataset'
# checking the system  directory for the downloaded dataset
os.listdir(data_dir)
['heart.csv']
# Create dataframe

heart_disease = pd.read_csv('./heart-disease-dataset/heart.csv')
print('Dataset imported...')
Dataset imported...

Stage 1: Let's have a look at the dataset:


In this phase, we will interact with the dataset to know the various fields that have been recorded.

# Check number of rows and columns present in the dataset

print(f'The dataset contains {heart_disease.shape[1]} number of columns; \nAnd {heart_disease.shape[0]} Number of rows.')
The dataset contains 14 number of columns; And 1025 Number of rows.
# the dataset

heart_disease.head(10)

Stage 2: Data Preparation and Cleaning:

  • First we will check the missing values in the dataset and will drop them if any present.
  • Secondly we'll check for duplicate values and will drop the rows containing duplicate values.
  • We will check for invalid data by the help of statistical description
# check for missing values
heart_disease.isnull()
heart_disease.isnull().sum()
age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64
  • The isnull().sum() summarizes the dataset with the missing values if any present.
  • It looks like the dataset is clean and no missing values present.
# Check for duplicate values

heart_disease.duplicated().any()
True
  • The dataset has some duplicate values.
# Drop the duplicate values

heart_disease.drop_duplicates(inplace=True)
heart_disease.shape
(302, 14)
  • The dataset looks clean as there were no missing values were and all the duplicate values were dropped.
  • The number of rows dropped from 1025 to 302.
heart_disease.describe().round(2)
Note
  • Everything looks fine, except the thal column
  • As per the documentation, the thalassemia column should contain 3 types of values i.e. 1 for normal, 2 for fixed defect and 3 for reversible defect.
  • The statistical description table shows minimum value 0.
  • Which will eventually miss guide the analysis, and visualization.
  • Let's check the column
heart_disease.thal.value_counts()
2    165
3    117
1     18
0      2
Name: thal, dtype: int64

These two Zeros are missing values. We need to get rid of them.

# Drop the extra missing values

heart_disease.drop(heart_disease.index[heart_disease.thal==0], inplace=True)
print('Dropped the missing values from thal column')
Dropped the missing values from thal column
# check info for data types and other infomations

heart_disease.info() 
<class 'pandas.core.frame.DataFrame'> Int64Index: 300 entries, 0 to 878 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 age 300 non-null int64 1 sex 300 non-null int64 2 cp 300 non-null int64 3 trestbps 300 non-null int64 4 chol 300 non-null int64 5 fbs 300 non-null int64 6 restecg 300 non-null int64 7 thalach 300 non-null int64 8 exang 300 non-null int64 9 oldpeak 300 non-null float64 10 slope 300 non-null int64 11 ca 300 non-null int64 12 thal 300 non-null int64 13 target 300 non-null int64 dtypes: float64(1), int64(13) memory usage: 35.2 KB
  • All the data are of int64 type except the oldpeak column.
  • The dataset has 14 columns, however we don't need all of them.
  • We need to keep the necessary columns that are essential for our analysis and will drop the rest.
  • We will perform analysis on Age, Sex, Chest pain (cp), Cholesterol (chol), fasting blood sugar level (fbs), Thalassemia (thl), and target columns.
# Drop columns that we don't need in this analysis

heart_disease.drop(['trestbps', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca'], axis='columns', inplace=True)
heart_disease.head()
heart_disease.shape
(300, 7)
  • We have cleaned and processed the dataset.
  • Now, we have 300 rows and 7 columns.

Stage 3: Let's interact with the dataset:


  • In this stage we will interact with the processed dataset.
  • We will ask some relevant questions.
# overall statistics

heart_disease.describe().round(2)
# Co-relation matrix table

heart_disease.corr().round(2)
# Corelation matrix plot

sns.heatmap(heart_disease.corr().round(2), annot=True, cmap='mako')
plt.title('Fig-1: Co-relation matrix of heart disease dataset\n', loc='left')
plt.show();
Notebook Image

Insights from the data


  1. From the statistical description, it shows that the admitted patients are of age from 29 years old to 77 years old.
  2. Mean of target is 0.54 which means approximately around 160-165 patients shows higher risk of heart attack.
  3. There is reverse co-relation between age and target; but from the statistics, patients of 54+ age group have shown high signs of heart disease.
  4. As per the data here, thalassemia is negatively co-related to chances of heart disease. However individuals in between 47-56 years old shows fixed defect which means blood does not flow in some parts of heart. Whereas, older individuals shows reversal defect, a sign of higher chances of severe anemia. The younger patients are at ease.
  5. Cholesterol shows positive co-relation with age and minor positive co-relation with thalassemia, means level of cholesterol increases with age so as the chances of blockage of arteries.
  6. The dataset has 1 float64 and rest are of int64 data types.

Related questions


  1. What age group is most vulnerable or having large number of patients with higher risk of heart attack?
  2. Are men mostly prone to heart attack or women?
  3. What chest pain types poses severe risk of heart attack?
  4. How fasting blood sugar is related to heart attack?
  5. What type of thalassemia severely leading to heart attack?
  6. Due to cholesterol, how many patients are at higher risk?

Now as we have checked the dataset and found some influential insights and questions, it's time to visualize

Stage 4: Data Visualization:


In this section, we will use exploratory data analysis techniques to solve the questions that we prepared in stage-3

# Check patient data 

mlabel = [heart_disease.target.value_counts()[1], heart_disease.target.value_counts()[0]]
plt.figure(figsize=(8,6))
heart_disease.target.value_counts().plot(kind='pie', labels=mlabel, autopct='%1.1f%%', colors=['#ff7722', '#2E86C1'])
plt.title('Fig-2: Patient\'s condition\n', loc='left')
plt.legend(['Heart attack', 'No-Heart attack'], loc='best')
plt.show()
Notebook Image

The pie chart shows that

  • 54.3% of the patients are at risk of heart attack.
  • Out of 300 people 163 are prone to heart attack.
  • While 137 are safe with no sign of heart attack.

Q.1: What age group has show higher sign of heart attack?


# Age distribution along the dataset

heart_disease.age.hist(bins=20, edgecolor='white')
plt.xlabel('Age')
plt.title('Fig-3: Age distribution along the dataset\n', loc='left')
plt.show();
Notebook Image
  • The displot shows that people of age 55-60 years old shows high distribution along the dataset.
  • Much of patients are of 50-70 age group.
  • We just checked the age distribution, but we don't have the actual numbers.
  • To find those numbers we will apply a limit to the age column.
def age_group(row):
    if row.age >= 70:
        return '70s'
    elif row.age >= 60:
        return '60s'
    elif row.age >= 50:
        return '50s'
    elif row.age >= 40:
        return '40s'
    elif row.age >= 30:
        return '30s'
    elif row.age >=20:
        return '20s'

heart_disease['age_group'] = heart_disease.apply(age_group, axis=1)
heart_disease.head()
  • Here, we just created a new column 'age_group', which gives a better picture of the patients age data.
# Create a dataframe with the total number of patients of all age groups

df1 = heart_disease.groupby(['age_group']).target.count().to_frame(name=None)

df1.reset_index(inplace=True) # convert the index column to column

df1.rename(columns={'target':'total_patients'}, inplace=True) # rename the column name

df1
  • First we created a series by grouping the age groups, and converted it to DataFrame with the help of .to_frame().
  • Then converted the index column name to column and renamed the column name
fig = plt.figure(figsize=(12,8))
df1.total_patients.plot(kind='pie', labels=df1.total_patients)
plt.legend(['20s', '30s', '40s', '50s', '60s', '70s'])
plt.title('Number of patients from each age group')
plt.show()
Notebook Image
  • The above pie chart shows age group of admitted patients.
  • 123 patients were in their 50s, followed by people in their 60s and 40s.
# Create another dataframe based on the target

df2 =heart_disease.groupby(['age_group', 'target']).target.count()#.to_frame(name=None)
df2
age_group  target
20s        1          1
30s        0          4
           1         10
40s        0         22
           1         50
50s        0         59
           1         64
60s        0         48
           1         32
70s        0          4
           1          6
Name: target, dtype: int64
df2 = df2.to_frame(name=None) # convert the series to dataframe
df2
# Rename the target column

df2.rename(columns={'target':'targetwise_total'}, inplace=True)
df2
  • From the above series we can say that
  1. Most of admitted patients are in there 50s.
  2. Heart issues starts from 30s, and after 40 it becomes severe.
  3. The df2 data frame shows how many patients have heart risk.
  4. Most people at 40s and 50s are at high risk of heart attack.

Q.2 Which gender were at high risk of heart attack?

# Divide dataset to higher risk and lower risk dataset

high_risk_patients = heart_disease.loc[(heart_disease.target==1)]
low_risk_patients = heart_disease.loc[heart_disease.target==0]
# Patients data who are at higher risk of heart attack
high_risk_patients.head()
  • We just divided the whole dataset to two parts as patients with higher risk and patients with lower risk.
  • For that we used .loc magic method.
  • So that we can find the distribution of chances of heart attack
# Histogram of Age distribution in higher risk patients of both genders

# sns.set(rc={'figure.figsize':(10,8)})
g = sns.FacetGrid(high_risk_patients, col='sex', margin_titles=True, height=6)
g.map(sns.histplot, 'age', color='#c04000')
g.add_legend()
g.fig.suptitle('Fig-4:Age distribution in higher risk patients')
plt.show()
Notebook Image

In the above hist plot sex =0 : Female, and sex=1 : Male.

The above hist plot concludes that:

  • Females of age group 50 are at high risk of heart disease
  • In case of male, around 40 and near 60, heart disease is common.
# Replacing gender values with name in higher_risk dataframe

high_risk_patients.sex = high_risk_patients.sex.replace([0,1], ['Female', 'Male'])
high_risk_patients.rename(columns={'target':'high_risk'}, inplace=True)
high_risk_patients.head()
/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py:5039: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().rename(
# Number of patients (genderwise) at high risk

d3_1 = high_risk_patients.groupby(['sex']).high_risk.count().to_frame(name=None)

d3_1.reset_index(inplace=True)
d3_1
# Group the higher risk patients based on gender

df4 = high_risk_patients.groupby(['sex','age_group']).high_risk.count().to_frame(name=None)
df4 = df4.transpose()

print('Table showing number of male patients and female patient with high risk of heart attack')
df4
Table showing number of male patients and female patient with high risk of heart attack
  • Most of female patients with higher risk of heart attack were in their 50s followed by 60s.
  • 70+ year old females were more at risk than male of same group.
# Total number of patients

d3 = heart_disease.groupby(['sex']).sex.count().to_frame(name=None)
d3.rename(columns={'sex':'total'}, inplace=True) # change column name
d3.reset_index(inplace=True) # convert index_col to column
d3.sex.replace([0,1],['Female','Male'], inplace=True) # changing values
d3
  • There were total 95 female patients and 205 male patients.
# Find percentage

percent_female = d3_1.high_risk.iloc[0]/d3.total.iloc[0] * 100 # female percentage

percent_male = d3_1.high_risk.iloc[1]/d3.total.iloc[1]*100 # male percentage


print(f'There were {round(percent_female,2)}% female were at high risk of heart attack.')
print(f'Also {round(percent_male,2)}% male were at high risk.')
There were 74.74% female were at high risk of heart attack. Also 44.88% male were at high risk.
# Bar graph of gender and target corelation

sns.countplot(x='sex', hue='target', data=heart_disease)
plt.xticks([1,0], ['Male', 'Female'])
plt.legend(labels=['No-Hert attack', 'Heart attack'])
plt.title('Gender Distributuon',loc='left')
plt.show();
Notebook Image

The above DataFrame and Barchart shows that

  • In comparision to male and female, female shows higher rate of risk of heart attack.
  • The barplot here shows the count of male and female.
  • There are 95 female with 205 male.
  • 74.4% female are at risk of heart attack, i.e. 71 out of 95 female.
  • 44.9% of male shows risk of heart attack, which is 92 out of 295 male.

Q3. Which chest pain results in heart attack?

# Types of chest pain

cp_type = heart_disease.groupby(['cp']).cp.count()
cp_type

cp
0    142
1     50
2     85
3     23
Name: cp, dtype: int64
# plot Types of chest pain

sns.countplot(heart_disease.cp)
plt.xticks([0,1,2,3,], ['Typical angina', 'Atypical angina', 'Non-angina', 'Asymptotic'])
plt.title('Common types of Chest Pain', loc='left')
plt.xlabel('Chest Pain')
plt.show()
/opt/conda/lib/python3.9/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Notebook Image
  • There are 4 types of chest pain:
  1. Typical angina: or Angina pectoris, a condition where heart does not get enough oxygen or blood due to coronary artery blockage.
  2. Atypical angina: Chest pain but no symptoms of angina, happens when heart doesn't get enough oxygeneted blood.
  3. Non-anginal: or Non-cardiac chest pain (NCCP), feels like chest pain but rather happens due to acidity i.e. Gastroesophageal Reflux Disease (GERD).
  4. Asymptotic chest pain: or Silent Myocardial Infarction (SMI), happens due to stress, high cholesterol, high bp, diabetes, and other physical or mental issues. One of a reason of premature death in India. Mostly seen in middle aged people
  • Here, 142 patients have typical angina, 85 have non-angina, 50 have atypical angina and rest 23 have asymptotic chest pain.
# Divide dataset based on chest pain types

typical_angina_patients = heart_disease.loc[heart_disease.cp==0]
atypical_angina_patients = heart_disease.loc[heart_disease.cp==1]
non_angina_patients = heart_disease.loc[heart_disease.cp==2]
asymptotic_patients = heart_disease.loc[heart_disease.cp==3]
typical_angina_patients.head(3)
typical_cp = typical_angina_patients.groupby(['target']).target.count()
typical_cp
target
0    103
1     39
Name: target, dtype: int64
# percent of typical angina high risk patients

cp0_highrisk_prct = typical_cp[1]/cp_type[0]*100

print(f'Among people with typical angina, only {round(cp0_highrisk_prct)}% are at high risk of heart attack.')
Among people with typical angina, only 27% are at high risk of heart attack.
# Similarly

atypical_cp = atypical_angina_patients.groupby(['target']).target.count()
non_anginal_cp = non_angina_patients.groupby(['target']).target.count()
asymptotic_cp = asymptotic_patients.groupby(['target']).target.count()


# percentage

cp1_risk = atypical_cp[1]/cp_type[0]*100
cp2_risk = non_anginal_cp[1]/cp_type[0]*100
cp3_risk = asymptotic_cp[1]/cp_type[0]*100

print('NOTE:')
print(f'Among patients with atypical angina {round(cp1_risk,2)}% were at a risk of heart attack.') 
print(f'Where as in case of non-anginal patients the risk is much higher at a rate of {round(cp2_risk,2)}% of patients, and {round(cp3_risk,2)}% patients with asymptotic angina are vulnerable')
NOTE: Among patients with atypical angina 28.87% were at a risk of heart attack. Where as in case of non-anginal patients the risk is much higher at a rate of 47.18% of patients, and 11.27% patients with asymptotic angina are vulnerable
# Bargraph to check heart attack risk due to different type chest pains

sns.countplot(x='cp',hue='target', data=heart_disease)
plt.legend(labels=['No-Heart attack', 'Heart attck'])
plt.xticks([0,1,2,3],['Typical angina', 'Atypical angina', 'Non-angina', 'Asymptotic'])
plt.title('Chest pain leading to Heart Attack', loc='left')
plt.xlabel('Chest pain')
plt.show()
Notebook Image
  • 142 patients have typical angina, among which only 27.5% are at risk of heart attack.
  • People with atypical angina, 82% are at risk of heart attack.
  • Whereas, among non-angina patients 78.8% are vulnerable to heart attack.
  • Asymptotic chest pain rarely lead to heart attack which is 7.7%.

Note:

  • Most of chest pains are stress releated. However, NCCP and atypical are vulnerable conditions to heart attack.

Q4. How fasting blood sugar level is related to heart attack?


# Barchart to compare the fbs level

sns.countplot(x='fbs', hue='target', data=heart_disease)
plt.legend(labels=['No-Heart attack', 'Heart attack'])
plt.title('Relation ship between Fasting blood sugar level and Heart disease', loc='left')
plt.xticks([0,1],['<120 mg/dL', '>120 mg/dL'])
plt.xlabel('Fasting blood sugar level (mg/dL)')
plt.show()
Notebook Image
# catplot to show the relationship b/w fbs and age

sns.catplot(x='fbs',y='age', data=heart_disease)
plt.title('Fig-10: Fasting blood sugar level vs Age\n', loc='left')
plt.xlabel('Fasting blood sugar level (mg/dl)')
plt.xticks([0,1],['<120', '>120'])
plt.show();
Notebook Image
# distribution plot of fbs on both the genders

sns.FacetGrid(heart_disease, hue='sex', aspect=4).map(sns.kdeplot,'fbs', shade=True)
plt.legend(labels=['Male', 'Female'])
plt.title('Fig-11: Fasting blood sugar level based on gender\n', loc='left')
plt.show();
Notebook Image
  • The above plots shows that fasting blood sugar level is not much affected by sex and age.
  • However, fbs level in female patients are higher than male (as per the kdeplot).

Q.5. What type of thalassemia leads to heart attack?


  • Thalassemia is a hereditary blood disorder where body is less capable of producing hemoglobin.
  • As the severity of case increases, it leads to anemia caused by red blood cells deficiency.
  • In case of normal thalassemia blood flow into heart is normal. Hence, very rare chance of heart attack
  • In fixed defect thalassemia, blood doesn't flow into heart or rarely flows. This is a severe case of low hemoglobin density in body. Eventually can lead to heart attack.
  • In reversible defect somehow blood flows and treatement in this phase can stop complications. Though it is the most severe case of thalassemia.
  • Reversible defect thalassemia is of two types alpha - depends on the number of genes inheritaed from parents; and beta-type depends on the affected part of hemoglobin.
# Types of thalassemia

thal_types = heart_disease.groupby(['thal']).thal.count()
thal_types
thal
1     18
2    165
3    117
Name: thal, dtype: int64
# Countplot of thal vs target

sns.countplot(x='thal',hue='target', data=heart_disease)
plt.title('Fig-12: Types of Thalassemia vs risk of heart disease\n', loc='left')
plt.legend(["No-Heart attack", 'Heart attack'])
plt.xticks([0,1,2],['Normal', 'Fixed defect', 'Reversible defect'], rotation=70)
plt.xlabel('Thalassemia')
plt.show()
Notebook Image
  • 165 patients had fixed defect thalassemia.
  • Whereas 117 had reversible defect, and only 18 had normal type.
# Thalassemia patients with high risk of heart attack

highrisk_thal = high_risk_patients.groupby(['thal']).thal.count()
highrisk_thal
thal
1      6
2    129
3     28
Name: thal, dtype: int64
# percent of thalassemia patients with high risk of heart attack

thal2_prcnt = highrisk_thal[2]/thal_types[2]*100
thal2_prcnt
78.18181818181819
# Gender wise thalassemia patients

genderwise_thaltypes = heart_disease.groupby(['thal', 'sex']).sex.count()
genderwise_thaltypes
thal  sex
1     0        1
      1       17
2     0       79
      1       86
3     0       15
      1      102
Name: sex, dtype: int64
# Number of patients with high risk of heart attack and thalassemia

genderbased_thal= high_risk_patients.groupby(['thal', 'sex']).sex.count()
genderbased_thal
thal  sex   
1     Male       6
2     Female    69
      Male      60
3     Female     2
      Male      26
Name: sex, dtype: int64
# Percentage of male patients with higher risk of heart attack and thalassemia fixed defect

thal2_prct_male = genderbased_thal[2][1]/genderwise_thaltypes[2][1]* 100
thal2_prct_male
69.76744186046511
# Percent of female patients with fixed defect thalassemia and at high risk 

thal2_prct_female = genderbased_thal[2][0]/genderwise_thaltypes[2][0]*100
thal2_prct_female
87.34177215189874
# Percentage of female patients with revesible defect thalassemia 

thal3_prct_female = genderbased_thal[3][0]/genderwise_thaltypes[3][0]*100
thal3_prct_female
13.333333333333334
# catplot of thalasemia and risk of heart attack based on gender

sns.catplot(x='sex', y='target', hue='thal', kind='bar', data=heart_disease, height=6)
plt.xticks([0,1], ['Female', 'Male'])
plt.title('Fig-13: Heart disease in relationship to Thalassemia\n',loc='left')
plt.show();
Notebook Image

The above countplot, boxplot and series informations show that:

  • 78.2% people with thalassemia 2 are at risk of heart attack.
  • Out of 165 people with fixed defect thalassemia, 129 are at the risk of heart attack.
  • Out of 79 female with thalassemia fixed defect, 69 have higher chance of heart attack; 87.3%
  • Out of 86 male with fixed thalassemia, 60 shows signs of heart attack; 69.8%
  • Female has higher heart attack rate due to thalassemia.
  • Interestingly, there's rare female with normal case of thalassemia.

Q6. How cholesterol level affect cardiac health?


# Cholesterol corelation with age

sns.regplot(x='age', y='chol', data=heart_disease)
plt.title('Fig-14: Age vs Cholesterol level\n', loc='left')
plt.ylabel('cholesterol')
plt.show();
Notebook Image

From above scatter plot:

  • Cholesterol level is very high in people of age group around 60.
  • This shows the chances of heart attack also high at age around 60.
# Gender wise cholesterol level

sns.FacetGrid(heart_disease, hue='sex', aspect=4).map(sns.kdeplot,'chol', shade=True)
plt.legend(labels=['Male', 'Female'])
plt.title('Fig-16: Gender wise Serum cholesterol level\n', loc='left')
plt.xlabel('Cholesterol')
plt.show();
Notebook Image
  • This distplot shows, male has high cholesterol level than female.

  • 200-239 mg/dL is the borderline level of cholesterol. Whereas, above 240 is high level of cholesterol, leading to serious cardiac problems.

  • Here, most patients has high level of cholesterol. A severe sign or heart attack.

  • The patients with borderline and high cholesterol level, may need special care.

def chol_level(row):
    if row.chol >= 200 and row.chol<=239:
        return 'border line'
    elif row.chol >=240:
        return 'excess'
    elif 170 <= row.chol <=200:
        return 'normal'
    else:
        return 'less'
heart_disease['chol_level'] = heart_disease.apply(chol_level, axis=1)
heart_disease.head(5)
heart_disease.groupby(['chol_level', 'target']).chol.count()
chol_level   target
border line  0         38
             1         58
excess       0         79
             1         76
less         0          6
             1          6
normal       0         14
             1         23
Name: chol, dtype: int64
  • The above series shows number of patients from each cholesterol level category and how many patients needs excess care.
  • Low cholesterol level signifies malnutrition, low hemoglobin, low level of fat absorption, and thyroid or liver issues.
# Save the processed data (if necessary)

heart_disease.to_csv('patients_processed_data.csv')

Inferences and Conclusion:


With the exploratory data analysis of cardiac patients heart report, following points are concluded:

  1. Male of age group 40, and near 60 shows higher chances of heart attack signs.
  2. Female of age group 50 are at risk of heart attack.
  3. Stress, cholesterol level, blood sugar level (diabetes) and thalassemia can lead to heart attacks
  4. People with atypical angina and non-typical angina shows higher signs of heart attack.
  5. Fasting blood glucose level less than 120 mg/dL can lead to higher risk of heart attack.
  6. Female shows slightly higher density of fasting blood sugar level than male.
  7. Fixed defect thalassemia posses higher risk of heart attack. Female are more vulnerable than male.
  8. Patients with reversible thalassemia can be saved with proper treatment as it shows comparatively lower sign of heart attack.
  9. Male has high cholesterol level, which is a most risk factor for heart stroke.

  • The data used here is a little older, albeit the same EDA procedure can be used to check the health condition of a society or a hospital to track how many patients need emergency care, what precautions can be taken etc.
  • This prepared dataset, can also be used in machine learning models to predict new patients chances of vulnerability to heart attack or stroke by their blood sugar, cholesterol, heart rate measures.

Reference and Future Work


import jovian
jovian.commit()
[jovian] Updating notebook "sahooashru/zerotopandas-course-project-starter" on https://jovian.ai [jovian] Committed successfully! https://jovian.ai/sahooashru/zerotopandas-course-project-starter