DATA ANALYSIS : Automobile Dataset

`Problem`

Let's say we have a friend named Tom. And Tom wants to sell his car. But the problem is he doesn't know how much he should sell his car for. Tom wants to sell his car for as much as he can. But he also wants to set the price reasonably, so someone would want to purchase it. So the price he sets should represent the value of the car.How can we help Tom determine the best price for his car? Let's think like data scientists and clearly define some of his problems. For example, is there data on the prices of other cars and their characteristics? What features of cars affect their prices? Color? Brand? Does horsepower also effect the selling price, or perhaps something else? As a data analyst or data scientist, these are some of the questions we can start thinking about. To answer these questions, we're going to need some data.

The final model has efficiency of 84% and below one is it's performance graph

DistributionPlot(y_test, yhat, "Actual Values (Test)", "Predicted Values (Test)", Title)

/opt/conda/lib/python3.8/site-packages/seaborn/distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
  warnings.warn(msg, FutureWarning)
/opt/conda/lib/python3.8/site-packages/seaborn/distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
  warnings.warn(msg, FutureWarning)

`TABLE OF CONTENT`

Data Acquisition
Identify and handle missing values
Data Standardization
Data Normalization
Binning
Analyzing Individual Feature Patterns using Visualization
Model Development

Linear Regression and Multiple Linear Regression
Model Evaluation using Visualization
Polynomial Regression and Pipeline
Measures for Insample Evaluation
Prediction and Decision Making

Model Evaluation and Refinement
Conclusion
Reference

`1.Data Acquisition`

There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different places, on your local machine or sometimes online.In our case, the Automobile Dataset is an online source, and it is in CSV (comma separated value) format. Let's use this dataset as an example to practice data reading.

data source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data
date type : .csv

The Pandas Library is a useful tool that enables us to read various datasets into a data frame;so that all we need to do is import Pandas. If you crossed by error, install it first.

We use pandas.read_csv() function to read the csv file. In the bracket, we put the file path along with a quotation mark, so that pandas will read the file into a data frame from that address. The file path can be either an URL or your local file address.

Because the data does not include headers, we can add an argument headers = None inside the read_csv() method, so that pandas will not automatically set the first row as a header.

You can also assign the dataset to any variable you create.