Learn practical skills, build real-world projects, and advance your career

IBM Employee Churn Prediction

The task at hand is to help IBM retain their valuable employees. I will do this by predicting attrition of those employees and exploring what the key drivers of employee churn are.

Four datasets need to be merged so I can find out what department each employee works in, what their job title is, whats causing employee attrition and finally to gather all the features for prediction.

To predict attrition of IBM's valuable employees, I will build and compare three different classification models.


Reading in Data

#imports
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn import svm
from sklearn import tree
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn import naive_bayes
from sklearn import preprocessing
from sklearn import model_selection
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, log_loss
# dataset urls
employees_url = 'https://raw.githubusercontent.com/rhodyprog4ds/portfolio-stubbsdiondra-1/main/check_3/employees.csv?token=GHSAT0AAAAAAB3L5S7CQ74CP4JWTYGX6L6SY4OWKEA'
departments_url = 'https://raw.githubusercontent.com/rhodyprog4ds/portfolio-stubbsdiondra-1/main/check_3/departments.csv?token=GHSAT0AAAAAAB3L5S7D2QQ2PDPAWLSONZG6Y4OWKTA'
dept_emp_url = 'https://raw.githubusercontent.com/rhodyprog4ds/portfolio-stubbsdiondra-1/main/check_3/dept_emp.csv?token=GHSAT0AAAAAAB3L5S7DXELENZS344HHZTHQY4OWK2A'
job_role_url = 'https://raw.githubusercontent.com/rhodyprog4ds/portfolio-stubbsdiondra-1/main/check_3/titles.csv?token=GHSAT0AAAAAAB3L5S7CE7NVLVMJ65G37VJCY4OWLAA'
#reading into dataframe
employees = pd.read_csv(employees_url)
departments = pd.read_csv(departments_url)
dept_emp = pd.read_csv(dept_emp_url)
job_roles = pd.read_csv(job_role_url)

employees gives us information about the employees such as age, gender, income, education. martial status, satisfaction, attrition and more.