New York City Taxi Fare Prediction

alt

Dataset Link: https://www.kaggle.com/c/new-york-city-taxi-fare-prediction

We'll train a machine learning model to predict the fare for a taxi ride in New York city given information like pickup date & time, pickup location, drop location and no. of passengers.

This dataset is taken from a Kaggle competition organized by Google Cloud. It contains over 55 millions rows of training data. We'll attempt to achieve a respectable score in the competition using just a fraction of the data. Along the way, we'll also look at some practical tips for machine learning. PMost of the ideas & techniques covered in this notebook are derived from other public notebooks & blog posts.

To run this notebook, select "Run" > "Run on Colab" and connect your Google Drive account with Jovian. Make sure to use the GPU runtime if you plan on using a GPU.

You can find the completed version of this notebook here: https://jovian.ai/aakashns/nyc-taxi-fare-prediction-filled

TIP #1: Create an outline for your notebook & for each section before you start coding

Here's an outline of the project:

Download the dataset
Explore & analyze the dataset
Prepare the dataset for ML training
Train hardcoded & baseline models
Make predictions & submit to Kaggle
Peform feature engineering
Train & evaluate different models
Tune hyperparameters for the best models
Train on a GPU with the entire dataset
Document & publish the project online

1. Download the Dataset

Steps:

Install required libraries
Download data from Kaggle
View dataset files
Load training set with Pandas
Load test set with Pandas

Install Required Libraries

!pip install jovian opendatasets pandas numpy scikit-learn xgboost --quiet