Wids2020 Handlingmissingvalues - Notebook by usha rengaraju (usharengaraju)

Learn practical skills, build real-world projects, and advance your career

Created 4 years ago

Handling Missing Data

Missing Values occur because of various reasons, such as, corrupt data, failure to load the information, or incomplete extraction. Handling the missing values is one of the greatest challenges faced by analysts, because making the right decision on how to handle it generates robust data models. Missing values cause a lot of problems when using the dataset for any machine learning algorithm.They hinder with data analysis and data visualization.

In this kernel , lets look at the following methods to handle missing data

**1.Drop all missing Values
2.Drop the values above a certain threshold
3.Imputation using mean ,median and mode
4.Imputation using forward fill and backward fill
5.sklearn Imputer for Numerical Variables
6.sklearn Imputer for Categorical Variables **

# importing libraries
import numpy as np 
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
import category_encoders as ce
%matplotlib inline

# loading dataset 
df = pd.read_csv("../input/widsdatathon2020/training_v2.csv")