Project - Train a Deep Learning Model from Scratch
Deep Learning with PyTorch: Zero to GANs
For the course project, you will pick a dataset of your choice and apply the concepts learned in this course to train deep learning models end-to-end with PyTorch, experimenting with different hyperparameters & metrics.
- Find a dataset online (see the "Where to Find Datasets" section below)
- Understand and describe the modeling objective clearly
- What type of data is it? (images, text, audio, etc.)
- What type of problem is it? (regression, classification, generative modeling, etc.)
- Clean the data if required and perform exploratory analysis (plot graphs, ask questions)
- Modeling
- Define a model (network architecture)
- Pick some hyperparameters
- Train the model
- Make predictions on samples
- Evaluate on the test dataset
- Save the model weights
- Record the metrics
- Try different hyperparameters & regularization
- Conclusions - summarize your learning & identify opportunities for future work
- Publish and submit your Jupyter notebook
- (Optional) Write a blog post to describe your experiments and summarize your work. Use Medium or Github pages.
Note: There is no starter notebook for the course project. Please use the "New Notebook" button on Jovian to create a new notebook, "Run on Colab" to execute it, and "jovian.commit" to record versions.
Example notebooks for reference:
Use the following sources to find interesting datasets:
- https://www.kaggle.com/datasets (use the
opendatasetslibrary for downloading datasets) - https://course.fast.ai/datasets
- https://github.com/ChristosChristofidis/awesome-deep-learning#datasets
- https://www.kaggle.com/competitions (check the "Completed" tab)
- https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/
- https://lionbridge.ai/datasets/top-10-image-classification-datasets-for-machine-learning/
- https://archive.ics.uci.edu/ml/index.php
- https://github.com/awesomedata/awesome-public-datasets
- https://datasetsearch.research.google.com/
Indian stocks data
- https://nsepy.xyz/
- https://nsetools.readthedocs.io/en/latest/usage.html
- https://www.kaggle.com/rohanrao/nifty50-stock-market-data
Indian Air Quality Data
Indian Covid-19 Dataset
World Covid-19 Dataset
USA Covid-19 Dataset
Megapixels Dataset for Face Detection, GANs, Human Localization
- https://megapixels.cc/datasets/ (Contains 7 different datasets)
Agriculture based dataset
- https://www.kaggle.com/srinivas1/agricuture-crops-production-in-india
- https://www.kaggle.com/unitednations/global-food-agriculture-statistics
- https://www.kaggle.com/kianwee/agricultural-raw-material-prices-19902020
- https://www.kaggle.com/jmullan/agricultural-land-values-19972017
India Digital Payments UPI
India Consumption of LPG
India Import/Export Crude OIl
US Unemployment Rate Data
India Road accident Data
Data science Jobs Data
- https://www.kaggle.com/sl6149/data-scientist-job-market-in-the-us
- https://www.kaggle.com/jonatancr/data-science-jobs-around-the-world
- https://www.kaggle.com/rkb0023/glassdoor-data-science-jobs
H1-b Visa Data
Donald Trump’s Tweets
Hilary Clinton and Trump’s Tweets
Asteroid Dataset
Solar flares Data
Human face generation GANs
F-1 Race Data
Automobile Insurance
PUBG
CS GO
- https://www.kaggle.com/mateusdmachado/csgo-professional-matches
- https://www.kaggle.com/skihikingkevin/csgo-matchmaking-damage
Dota 2
Cricket
Basketball
Football