DATA ANALYSIS : Automobile Dataset
Problem
The final model has efficiency of 84%
and below one is it's performance graph
DistributionPlot(y_test, yhat, "Actual Values (Test)", "Predicted Values (Test)", Title)
/opt/conda/lib/python3.8/site-packages/seaborn/distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
/opt/conda/lib/python3.8/site-packages/seaborn/distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
TABLE OF CONTENT
- Data Acquisition
- Identify and handle missing values
- Data Standardization
- Data Normalization
- Binning
- Analyzing Individual Feature Patterns using Visualization
- Model Development
- Linear Regression and Multiple Linear Regression
- Model Evaluation using Visualization
- Polynomial Regression and Pipeline
- Measures for Insample Evaluation
- Prediction and Decision Making
- Model Evaluation and Refinement
- Conclusion
- Reference
1.Data Acquisition
.csv, .json, .xlsx
etc. The dataset can be stored in different places, on your local machine or sometimes online.In our case, the Automobile Dataset
is an online source, and it is in CSV (comma separated value) format. Let's use this dataset as an example to practice data reading.
- data source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data
- date type : .csv
The Pandas Library is a useful tool that enables us to read various datasets into a data frame;so that all we need to do is import Pandas. If you crossed by error, install it first.
We use pandas.read_csv()
function to read the csv file. In the bracket, we put the file path along with a quotation mark, so that pandas will read the file into a data frame from that address. The file path can be either an URL or your local file address.
Because the data does not include headers, we can add an argument headers = None
inside the read_csv()
method, so that pandas will not automatically set the first row as a header.
You can also assign the dataset to any variable you create.