Assignment 3 - Feed Forward Neural Networks

Starter notebook :
Submit here :
Submission Deadline : June 13 (Sat), 8.30 AM PST/9:00 PM IST


The ability to try many different neural network architectures to address a problem is what makes deep learning really powerful, especially compared to shallow learning techniques like linear regression, logistic regression etc. In this assignment, you will:

  1. Explore the CIFAR10 dataset:
  2. Set up a training pipeline to train a neural network on a GPU
  3. Experiment with different network architectures & hyperparameters

Steps to complete the assignment

  1. Fork & run this notebook:
  2. Fill out all the ??? in the notebook to complete the assignment, and commit the final version to Jovian
  3. Submit your assignment here:
  4. (Optional) Write a blog post on one of the topics suggested at the end of the notebook
  5. (Optional) Share your work with the community on the Share Your Work Here - Assignment 3 thread

CORRECTION: There was a small error in the starter notebook where the full dataset was used for creating training, validation & test data loaders. This has been fixed now, please “fork” the starter notebook again if have already forked the notebook.

Make sure to review the material from Lecture 3 before starting the assignment. Please reply here if you have any questions or face issues. The recommended platform for writing your blog post is .


A post was split to a new topic: "How to train CNN with images of different sizes? "

If the input images are of different sizes then we need to perform some image processing. We need to convert all images of having same shape & size before feeding into the network


This would require the use of convolutional layers and some sort of custom batch processing (since all images would have different sizes, which can’t be batched easily).

The convolutional layers process the image as usual, but the last step contains so called global max pooling. This means that you take a maximum value out of every feature map of the conv-layer result. There can be more feature maps than channels of image, so usually, when used with this sort of pooling, you would like the last layer to have many feature maps. This gives you a data that would have the same size no matter what (because the number of feature maps would be hard-coded). With this data (having a form of vector) you can use it as an input to linear layers.

Note: you would have to take care that none of the images becomes to small to activate with your conv layers, otherwise it will break apart. Also the input images would have to have the same number of channels, but it’s usually not a problem.


I was just wondering: Is it normal that I have Memory: 4243 / 2048 MB even before I got to the Training Model part?

@aakashns Hello, I’m follwoing the course, i’m from Venezuela. I have a question. Could i write the blog post in spanish? I can write it in english but i think that will be more helpful for hispanic speakers people.


I dont speak for Aakash but helping your own people is a really good idea. :slight_smile:


Hello @aakashns, please fix the post as it links to Share Your Work Here - Assignment 2.

The correction:
Share Your Work Here - Assignment 3.


i was working on my assignment 3, and when i was running the fit function, with 100 epochs, its way too slow, and this is the draft session,
and the CPU is total red and only 2% GPU is being used. i followed the lecture thoroughly, what is wrong with it… please tell me it would be much appreciated!!

Thank you!!


thank you! ill try it the next time!!

1 Like

Well, there is a really easy way to do it. You can open the Settings tab from the right side in the notebook and check GPU as the accelerator.

I’ll delete the previous post now.

1 Like

Hey everyone,

I’m facing the same issue like @TumAro

The code correctly detected cuda, and the Accelerator is already set to GPU. Any idea what’s wrong here?



as @aadhav-n2 said to do to me, I followed step by step still no luck for me…

@aniketj97 @TumAro I also have the same problem. Tried accelerating GPUs, but my GPU usage capped at ~10%. However, it was better than 2% usage. I trained for 50 epochs and got results in ~3m.

CPU usage still remains above ~150%. Nevermind, I’ll try to look further but anyways it shouldn’t be a problem for you guys. 3-5 mins for 50-70 epochs is acceptable.

50 epochs, LR 0.001
CPU time: 7:25 min
GPU time: 3:16 min

Hello Everyone!

I tried different learning rates and epochs but the highest accuracy I get is 45% only. I use 100 epochs and 0.001 learning rate. How can I improve mine? Did you get higher accuracy rate? How?

@tanmher09 Try 0.01 or 0.1 as learning rate for 20-25 epochs, otherwise, it might never converge to the local minimum. I hope you would get 50% accuracy depending on your architecture.

This is not the best combination. You can try out different learning rates.

@TumAro @jhonathanortiz @tanmher09 @tanmher09 anyone who is facing an issue on kaggle kernel(related to GPU/CPU) can also try on Google Colab.

Hello TumAro,
By default the calculations are done on the CPU.
GPU acceleration is probably disabled on the environment you are using.
If you are using Kaggle environment to execute your python notebook, select “GPU” in the “accelerator tabs” at the right of the page.
Good work :slight_smile:

1 Like

Hello guys, the downloading of CIFAR dataset in kaggle fails and gives an error, and in collab It says it can’t detect jupyter notebook or python and I can’t commit so I had to manually download the notebook and upload it.

Anybody else having these problems and maybe some fixes for future reference?