Learn practical skills, build real-world projects, and advance your career

This is a beginner friendly notebook which aims to perform exploratory data analysis using graph visualizations.

We use the Linear regression model to predict car prices, post which we calculate the error percentage using the mean absolute error method and we try to make it better by manipulating our data input to the model(feature selection)

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
/kaggle/input/vehicle-dataset-from-cardekho/Car details v3.csv /kaggle/input/vehicle-dataset-from-cardekho/CAR DETAILS FROM CAR DEKHO.csv /kaggle/input/vehicle-dataset-from-cardekho/car data.csv
df_cardekho = pd.read_csv("/kaggle/input/vehicle-dataset-from-cardekho/CAR DETAILS FROM CAR DEKHO.csv")
df_cardata = pd.read_csv("/kaggle/input/vehicle-dataset-from-cardekho/car data.csv")
df_cardetails = pd.read_csv("/kaggle/input/vehicle-dataset-from-cardekho/Car details v3.csv")
df_cardekho.info()
df_cardekho
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4340 entries, 0 to 4339 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 4340 non-null object 1 year 4340 non-null int64 2 selling_price 4340 non-null int64 3 km_driven 4340 non-null int64 4 fuel 4340 non-null object 5 seller_type 4340 non-null object 6 transmission 4340 non-null object 7 owner 4340 non-null object dtypes: int64(3), object(5) memory usage: 271.4+ KB

We do not have any null values in our data from cardekho.csv