Learn practical skills, build real-world projects, and advance your career

HEART DISEASE ANALYSIS (R)

Goal

This analysis will expose the presence of heart disease in the patient. The attribute here is integer valued from 0 (no presence) to 4.

Dataset Information

-The cardiovascular disease dataset is an open-source dataset found on Kaggle. The data consists of 70,000 patient records (34,979 presenting with cardiovascular disease and 35,021 not presenting with cardiovascular disease) and contains 11 features (4 demographic, 4 examination, and 3 social history):

  • Age (demographic)
  • Height (demographic)
  • Weight (demographic)
  • Gender (demographic)
  • Systolic blood pressure (examination)
  • Diastolic blood pressure (examination)
  • Cholesterol (examination)
  • Glucose (examination)
  • Smoking (social history)
  • Alcohol intake (social history)
  • Physical activity (social history)

Some features are numerical, others are assigned categorical codes, and others are binary values. The classes are balanced, but there were more female patients observed than male patients. Further, the continuous-valued features are almost normally distributed; however, most categorical-valued features are skewed towards "normal," as opposed to "high" levels of potentially pathological features.

Methodology of analysis
1. Install and load R libraries
2. Explore the data
3. Transform the data
4. Perform Data visualisations
5. Analyse correlation in between attributes

Tools: R

1. Install and load R libraries

#Loading Libraries
#install.packages("tidyverse")
#install.packages("corrplot")
#install.packages("ggplot2")
library(tidyverse)
library(corrplot)
library(ggplot2)

2. Data Exploration