Learn practical skills, build real-world projects, and advance your career

Air Quality Analysis

Domain Name: Environment Air quality

Abstract:

To check the quality of air using ‘Air Quality Chemical Multisensor Device’ by finding the R^2 score and coefficient of regression using different regression models and the best model is selected to evaluate the Air Quality.

Dataset: Air quality of an Italian city

(https://archive.ics.uci.edu/ml/datasets/Air+quality)

The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level, within an Italian city. Data were recorded from March 2004 to February 2005 (one year) representing the longest freely available recordings of on field deployed air quality chemical sensor devices responses. Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by a co-located reference certified analyzer.

Evidences of cross-sensitivities as well as both concept and sensor drifts are present as described in De Vito et al., Sens. And Act. B, Vol. 129,2,2008 (citation required) eventually affecting sensors concentration estimation capabilities. Missing values are tagged with -200 value.

Project Design:

Pre-processing: It is the first step to read the dataset and clean the data
i.e. removing unwanted data or identifying null values. If any null values exist, we replace them with constant values or removing duplicates.

Exploration: Visualizing the dataset, detect outliers, replacing a missing value and cleaning the dataset, splitting training dataset into training and testing sets and checks for any correlation among the features using heatmap.
Prediction: Here we predict the quality of air by finding R^2 score using different Regression models

Finally, I declare that the model with the highest R^2 score on both training and testing datasets will be concluded as the best model for evaluating the Quality of Air.

Pre-processing

# Libraries should you install before going to further steps
!pip install numpy pandas matplotlib seaborn sklearn jovian
# Importing required libraries for further processes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
import jovian
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "pavankumar42sn/airquality" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/pavankumar42sn/airquality