Sign In

Image Classification using CNNs and ResNets in PyTorch

Part 5 of "PyTorch: Zero to GANs"

This post is the fifth in a series of tutorials on building deep learning models with PyTorch, an open source neural networks library. Check out the full series:

  1. PyTorch Basics: Tensors & Gradients
  2. Linear Regression & Gradient Descent
  3. Image Classfication using Logistic Regression
  4. Training Deep Neural Networks on a GPU
  5. Image Classification using CNNs (this notebook)
  6. Generative Adverserial Networks (GANs)

In the previous tutorial, we trained a feedfoward neural networks with a single hidden layer to classify handwritten digits from the MNIST dataset with over 97% accuracy. For this tutorial, we'll use the CIFAR10 dataset, which consists of 60000 32x32 px colour images in 10 classes. Here are some sample images from the dataset:

System Setup

This notebook is hosted on, a platform for sharing data science projects. If you want to follow along and run the code as you read, you can choose the "Run on Kaggle" option from the "Run" dropdown above. Otherwise, to run the code on your machine, you can clone the notebook, install the required dependencies using conda, and start Juptyer by running the following commands:

pip install jovian --upgrade   # Install the jovian library 
jovian clone 05-cifar10-cnn    # Download notebook & dependencies
cd 05-cifar10-cnn              # Enter the created directory 
conda env update               # Install the dependencies
conda activate 05-cifar10-cnn  # Activate virtual env
jupyter notebook               # Start Jupyter

On older versions of conda, you might need to run source activate 05-cifar10-cnn to activate the environment. For a more detailed explanation of the above steps, check out the System setup section in the first notebook.

Before you start executing the code below, you may want to clear the cell outputs by selecting "Kernel > Restart and Clear Output" from the Jupyter notebook menu bar, to avoid confusion.

Exploring the Data

We'll download the images in PNG format from this page, using some helper functions from the torchvision and tarfile packages.

In [1]:
import os
import torch
import torchvision
import tarfile
from torchvision.datasets.utils import download_url
In [2]:
# Dowload the dataset
dataset_url = ""
download_url(dataset_url, '.')
0%| | 0/168168549 [00:00<?, ?it/s]
Downloading to ./cifar10.tgz
168173568it [00:01, 84754556.64it/s]
In [3]:
# Extract from archive
with'./cifar10.tgz', 'r:gz') as tar:

The dataset is extracted to the directory data/cifar10. It contains 2 folders train and test, containing the training set (50000 images) and test set (10000 images) respectively. Each of them contains 10 folders, one for each class of images. Let's verify this using os.listdir.

In [4]:
data_dir = './data/cifar10'

classes = os.listdir(data_dir + "/train")
['labels.txt', 'train', 'test'] ['airplane', 'truck', 'horse', 'deer', 'dog', 'cat', 'ship', 'bird', 'automobile', 'frog']

Let's look inside a couple of folders, one from the training set and another from the test set. As an exercise, you can verify that that there are an equal number of images for each class, 5000 in the training set and 1000 in the test set.

In [5]:
airplane_files = os.listdir(data_dir + "/train/airplane")
print('No. of training examples for airplanes:', len(airplane_files))
No. of training examples for airplanes: 5000 ['20902_airplane.png', '48119_airplane.png', '48580_airplane.png', '8258_airplane.png', '18047_airplane.png']
In [6]:
ship_test_files = os.listdir(data_dir + "/test/ship")
print("No. of test examples for ship:", len(ship_test_files))
No. of test examples for ship: 1000 ['2265_ship.png', '8567_ship.png', '3303_ship.png', '3729_ship.png', '4881_ship.png']

The above directory structure (one folder per class) is used by many computer vision datasets, and most deep learning libraries provide utilites for working with such datasets. We can use the ImageFolder class from torchvision to load the data as PyTorch tensors.

In [8]:
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
In [9]:
dataset = ImageFolder(data_dir+'/train', transform=ToTensor())

Let's look at a sample element from the training dataset. Each element is a tuple, containing a image tensor and a label. Since the data consists of 32x32 px color images with 3 channels (RGB), each image tensor has the shape (3, 32, 32).

In [10]:
img, label = dataset[0]
print(img.shape, label)
torch.Size([3, 32, 32]) 0
tensor([[[0.8588, 0.8588, 0.8627,  ..., 0.8510, 0.8471, 0.8392],
         [0.8667, 0.8667, 0.8745,  ..., 0.8588, 0.8549, 0.8471],
         [0.8667, 0.8667, 0.8745,  ..., 0.8588, 0.8549, 0.8471],
         [0.8980, 0.9020, 0.9098,  ..., 0.8980, 0.8902, 0.8863],
         [0.8471, 0.8549, 0.8706,  ..., 0.8980, 0.8902, 0.8824],
         [0.7608, 0.7490, 0.7725,  ..., 0.8980, 0.8902, 0.8824]],

        [[0.9333, 0.9333, 0.9373,  ..., 0.9176, 0.9137, 0.9059],
         [0.9412, 0.9412, 0.9490,  ..., 0.9294, 0.9216, 0.9137],
         [0.9412, 0.9412, 0.9490,  ..., 0.9255, 0.9216, 0.9137],
         [0.9608, 0.9569, 0.9569,  ..., 0.9412, 0.9412, 0.9412],
         [0.9020, 0.9098, 0.9255,  ..., 0.9412, 0.9412, 0.9373],
         [0.8157, 0.8039, 0.8275,  ..., 0.9412, 0.9412, 0.9373]],

        [[0.9608, 0.9608, 0.9647,  ..., 0.9490, 0.9412, 0.9412],
         [0.9686, 0.9686, 0.9765,  ..., 0.9608, 0.9529, 0.9490],
         [0.9686, 0.9686, 0.9765,  ..., 0.9569, 0.9529, 0.9490],
         [0.9804, 0.9765, 0.9804,  ..., 0.9647, 0.9647, 0.9608],
         [0.9176, 0.9255, 0.9373,  ..., 0.9686, 0.9647, 0.9608],
         [0.8275, 0.8157, 0.8392,  ..., 0.9686, 0.9647, 0.9608]]])

The list of classes is stored in the .classes property of the dataset. The numeric label for each element corresponds to index of the element's label in the list of classes.

In [11]:
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

We can view the image using matplotlib, but we need to change the tensor dimensions to (32,32,3). Let's create a helper function to display an image and its label.

In [12]:
import matplotlib.pyplot as plt

def show_example(img, label):
    print('Label: ', dataset.classes[label], "("+str(label)+")")
    plt.imshow(img.permute(1, 2, 0))

Let's look at a couple of images from the dataset. As you can tell, the 32x32px images are quite difficult to identify, even for the human eye. Try changing the indices below to view different images.

In [13]:
Label: airplane (0)
Notebook Image
In [14]:
Label: airplane (0)
Notebook Image

Before continuing, let's save our work using the jovian python library.

In [15]:
!pip install jovian --upgrade -q
In [16]:
import jovian
In [17]:
[jovian] Saving notebook..
[jovian] Creating a new notebook on [jovian] Please enter your API key ( from ): API KEY: ········ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully!

jovian.commit uploads the notebook to your account, captures the Python environment and creates a sharable link for your notebook as shown above. You can use this link to share your work and let anyone run it online or reproduce it with the jovian clone command.

Training and Validation Datasets

While building real world machine learning models, it is quite common to split the dataset into 3 parts:

  1. Training set - used to train the model i.e. compute the loss and adjust the weights of the model using gradient descent.
  2. Validation set - used to evaluate the model while training, adjust hyperparameters (learning rate etc.) and pick the best version of the model.
  3. Test set - used to compare different models, or different types of modeling approaches, and report the final accuracy of the model.

Since there's no predefined validation set, we can set aside a small portion of the training set to be used as the validation set. Let's define a function that randomly picks a given fraction of the element indices for creating the validation set. We'll also pass a random seed into the function, so we can recreate the same training/validation split in future runs.

In [19]:
import numpy as np

def split_indices(n, val_pct=0.1, seed=99):
    # Determine size of validation set
    n_val = int(val_pct*n)
    # Set the random seed (for reproducibility)
    # Create random permutation of 0 to n-1
    idxs = np.random.permutation(n)
    # Pick first n_val indices for validation set
    return idxs[n_val:], idxs[:n_val]
In [20]:
val_pct = 0.2
rand_seed = 42

train_indices, val_indices = split_indices(len(dataset), val_pct, rand_seed)
print(len(train_indices), len(val_indices))
print('Sample validation indices: ', val_indices[:10])
40000 10000 Sample validation indices: [33553 9427 199 12447 39489 42724 10822 49498 4144 36958]

The jovian library also provides a simple API for recording important parameters related to the dataset, model training, results etc. for easy reference and comparison between multiple experiments. Let's record dataset_url, val_pct and rand_seed using jovian.log_dataset.

In [21]:
    'dataset_url': dataset_url,
    'val_pct': val_pct,
    'rand_seed': rand_seed
[jovian] Dataset logged.

We have randomly shuffled the indices, and selected a small portion ( 20% ) to serve as the validation set. To process our data in small batches, we can now create PyTorch data loaders for each of these using a SubsetRandomSampler, which samples elements randomly from a given list of indices, while greating batches of data.

In [23]:
from import SubsetRandomSampler
from import DataLoader

In [24]:
# Training sampler and data loader
train_sampler = SubsetRandomSampler(train_indices)
train_dl = DataLoader(dataset, 

# Validation sampler and data loader
val_sampler = SubsetRandomSampler(val_indices)
val_dl = DataLoader(dataset,

We can look at batches of images from the dataset using the make_grid method from torchvision. Each time the following code is run, we get a different bach, since the sampler shuffles the indices before creating batches.

In [25]:
from torchvision.utils import make_grid

def show_batch(dl):
    for images, labels in dl:
        fig, ax = plt.subplots(figsize=(10, 10))
        ax.set_xticks([]); ax.set_yticks([])
        ax.imshow(make_grid(images, 10).permute(1, 2, 0))
In [26]:
Notebook Image

Once again, let's save and commit our work using jovian before proceeding further.

In [27]:
[jovian] Saving notebook..
[jovian] Updating notebook "835f1d08ea4241c5b8c293e82028fa3f" on [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Recording metrics, hyperparameters, datasets & git information.. [jovian] Committed successfully!

After the first commit, all subsequent commits record a new version of the notebook within the same Jovian project. You can use jovian.commit to version Jupyter notebooks (instead of doing File > Save As), and keep your data science projects organized. Also check out the Records tab on the project page to see how the information logged using jovian.log_dataset appears on the UI.

Defining the Model (Convolutional Neural Network)

In our previous tutorial, we defined a deep neural network with fully-connected layers using nn.Linear. For this tutorial however, we will use a convolutional neural network, using the nn.Conv2d class from PyTorch.

The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights. This kernel “slides” over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel. - Source

I highly recommend checking out the following articles if want to gain a better understanding of convolutions:

  1. Intuitively understanding Convolutions for Deep Learning by Irhum Shafkat
  2. Convolutions in Depth by Sylvian Gugger (this article implements convolutions from scratch)

There are certain advantages offered by convolutional layers when working with image data:

  • Fewer parameters: A small set of parameters (the kernel) is used to calculate outputs of the entire image, so the model has much fewer parameters compared to a fully connected layer.
  • Sparsity of connections: In each layer, each output element only depends on a small number of input elements, which makes the forward and backward passes more efficient.
  • Parameter sharing and spatial invariance: The features learned by a kernel in one part of the image can be used to detect similar pattern in a different part of another image.

We will also use a max-pooling layers to progressively decrease the height & width of the output tensors from each convolutional layer.

Before we define the entire model, let's look at how a single convolutional layer followed by a max-pooling layer operates on the data.

In [28]:
import torch.nn as nn
import torch.nn.functional as F
In [29]:
In [30]:
simple_model = nn.Sequential(
    nn.Conv2d(3, 8, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2)

Refer to Sylvian's post for an explanation of kernel_size, stride and padding.

In [31]:
for images, labels in train_dl:
    print('images.shape:', images.shape)
    out = simple_model(images)
    print('out.shape:', out.shape)
images.shape: torch.Size([100, 3, 32, 32]) out.shape: torch.Size([100, 8, 16, 16])

The Conv2d layer transforms a 3-channel image to a 16-channel feature map, and the MaxPool2d layer halves the height and width. The feature map gets smaller as we add more layers, until we are finally left with a n x 1 x 1 feature map (where n is the no. of channels), which can be flattened into a vector. We can then add a fully connected layer at the end to get vector of size 10 for each image.

In [32]:
model = nn.Sequential(
    nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 16 x 16
    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 8 x 8

    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 4 x 4
    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 2 x 2

    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 1 x 1,
    nn.Flatten(), # output: bs x 16
    nn.Linear(16, 10) # output: bs x 10 
In [38]:
torch.Size([16, 16, 3, 3])

Let's verify that the model produces the expected output on a batch of training data. The 10 outputs for each image can be interpreted as probabilities for the 10 target classes (after applying softmax), and the class with the highest probability is chosen as the label predicted by the model for the input image. Check out Part 3 (logistic regression) for a more detailed discussion on interpeting the outputs, applying softmax and identifying the predicted labels.

In [39]:
for images, labels in train_dl:
    print('images.shape:', images.shape)
    out = model(images)
    print('out.shape:', out.shape)
    print('out[0]:', out[0])
images.shape: torch.Size([100, 3, 32, 32]) out.shape: torch.Size([100, 10]) out[0]: tensor([-0.1341, -0.1382, 0.0618, -0.0479, -0.0960, -0.0788, -0.1926, -0.0793, -0.1694, 0.1084], grad_fn=<SelectBackward>)

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required. These are described in more detail in the previous tutorial.

In [51]:
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
        return torch.device('cpu')
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

Based on where you're running this notebook, your default device could be a CPU (torch.device('cpu')) or a GPU (torch.device('cuda'))

In [52]:
device = get_default_device()

We can now wrap our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU (if available), and use to_device to move our model to the GPU (if available).

In [53]:
train_dl = DeviceDataLoader(train_dl, device)
valid_dl = DeviceDataLoader(train_dl, device)
to_device(model, device)
  (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): ReLU()
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU()
  (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (9): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (10): ReLU()
  (11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (12): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU()
  (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (15): Flatten()
  (16): Linear(in_features=16, out_features=10, bias=True)

Once again, let's save and commit the notebook before we proceed further.

In [54]:
[jovian] Saving notebook..
[jovian] Updating notebook "835f1d08ea4241c5b8c293e82028fa3f" on [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Recording metrics, hyperparameters, datasets & git information.. [jovian] Committed successfully!

Training the Model

As in the previous tutorials, we can use cross entropy as the loss function and accuracy as the evaluation metric for our model. The training loop is also identical, so we can reuse the loss_batch, evaluate and fit functions from the previous tutorial.

The loss_batch function calculates the loss and metric value for a batch of data, and optionally performs gradient descent if an optimizer is provided.

In [55]:
def loss_batch(model, loss_func, xb, yb, opt=None, metric=None):
    # Generate predictions
    preds = model(xb)
    # Calculate loss
    loss = loss_func(preds, yb)
    if opt is not None:
        # Compute gradients
        # Update parameters             
        # Reset gradients
    metric_result = None
    if metric is not None:
        # Compute the metric
        metric_result = metric(preds, yb)
    return loss.item(), len(xb), metric_result

The evaluate function calculates the overall loss (and a metric, if provided) for the validation set.

In [56]:
def evaluate(model, loss_fn, valid_dl, metric=None):
    with torch.no_grad():
        # Pass each batch through the model
        results = [loss_batch(model, loss_fn, xb, yb, metric=metric)
                   for xb,yb in valid_dl]
        # Separate losses, counts and metrics
        losses, nums, metrics = zip(*results)
        # Total size of the dataset
        total = np.sum(nums)
        # Avg. loss across batches 
        avg_loss = np.sum(np.multiply(losses, nums)) / total
        avg_metric = None
        if metric is not None:
            # Avg. of metric across batches
            avg_metric = np.sum(np.multiply(metrics, nums)) / total
    return avg_loss, total, avg_metric

The fit function (from the previous tutorial) contains the actual training loop: it sets up an optimizer, trains the model using the training set, then evaluates it on the validation set, then logs the losses, metrics etc. and repeats the process for the given number of epochs.

There's one important addition though: We invoke model.train() before training the model and model.eval() before evaluating it on the validation set. We'll discover what these methods do when we discuss regularization.

In [65]:
def fit(epochs, model, loss_fn, train_dl, valid_dl, 
        opt_fn=None, lr=None, metric=None):
    train_losses, val_losses, val_metrics = [], [], []
    # Instantiate the optimizer
    if opt_fn is None: opt_fn = torch.optim.SGD
    opt = opt_fn(model.parameters(), lr=lr)
    for epoch in range(epochs):
        # Training
        for xb,yb in train_dl:
            train_loss,_,_ = loss_batch(model, loss_fn, xb, yb, opt)

        # Evaluation
        result = evaluate(model, loss_fn, valid_dl, metric)
        val_loss, total, val_metric = result
        # Record the loss & metric
        # Print progress
        if metric is None:
            print('Epoch [{}/{}], train_loss: {:4f}, val_loss: {:.4f}'
                  .format(epoch+1, epochs, train_loss, val_loss))
            print('Epoch [{}/{}], train_loss: {:.4f}, val_loss: {:.4f}, val_{}: {:.4f}'
                  .format(epoch+1, epochs, train_loss, val_loss, 
                          metric.__name__, val_metric))
    return train_losses, val_losses, val_metrics

We also define an accuracy function which calculates the overall accuracy of the model on an entire batch of outputs, so that we can use it as a metric in fit.

In [58]:
def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.sum(preds == labels).item() / len(preds)

Before we train the model, let's see how it performs on the validation set with the initial set of parameters.

In [59]:
val_loss, _, val_acc = evaluate(model, F.cross_entropy, 
                                    valid_dl, metric=accuracy)
print('Loss: {:.4f}, Accuracy: {:.4f}'.format(val_loss, val_acc))
Loss: 2.3069, Accuracy: 0.1001

The initial accuracy is around 10%, which is what one might expect from a randomly intialized model (since it has a 1 in 10 chance of getting a label right by guessing randomly).

We'll use the following hyperparmeters (learning rate, no. of epochs, batch_size etc.) to train our model. As an exercise, you can try changing these to see if you have achieve a higher accuracy in a shorter time.

In [60]:
num_epochs = 10
opt_fn = torch.optim.Adam
lr = 0.005

It's important to record the hyperparameters of every experiment you do, to replicate it later and compare it against other experiments. We can record them using jovian.log_hyperparams.

In [66]:
    'num_epochs': num_epochs,
    'opt_fn': opt_fn.__name__,
    'batch_size': batch_size,
    'lr': lr,
[jovian] Hyperparameters logged.
In [62]:
history = fit(num_epochs, model, F.cross_entropy, 
              train_dl, valid_dl, opt_fn, lr, accuracy)
train_losses, val_losses, val_metrics = history
Epoch [1/10], train_loss: 1.5166, val_loss: 1.5883, val_accuracy: 0.3903 Epoch [2/10], train_loss: 1.6534, val_loss: 1.4581, val_accuracy: 0.4556 Epoch [3/10], train_loss: 1.2277, val_loss: 1.3660, val_accuracy: 0.4958 Epoch [4/10], train_loss: 1.4546, val_loss: 1.3043, val_accuracy: 0.5248 Epoch [5/10], train_loss: 1.1544, val_loss: 1.2523, val_accuracy: 0.5443 Epoch [6/10], train_loss: 1.2026, val_loss: 1.1971, val_accuracy: 0.5653 Epoch [7/10], train_loss: 1.1218, val_loss: 1.1739, val_accuracy: 0.5739 Epoch [8/10], train_loss: 1.1784, val_loss: 1.1713, val_accuracy: 0.5747 Epoch [9/10], train_loss: 1.3260, val_loss: 1.1630, val_accuracy: 0.5865 Epoch [10/10], train_loss: 0.9220, val_loss: 1.1345, val_accuracy: 0.5942

Just as we have recorded the hyperparameters, we can also record the final metrics achieved by the model using jovian.log_metrics for reference, analysis and comparison.

In [63]:
    'train_loss': 1.1628,
    'val_loss': 1.2482,
    'val_accuracy': 0.5524
[jovian] Metrics logged.

We can also plot the valdation set accuracies to study how the model improves over time.

In [67]:
def plot_metric(metric_values):
    """Plot metric values in a line graph"""
    plt.plot(metric_values, '-x')
    plt.title('Accuracy vs. No. of epochs');
In [68]:
plot_metric([val_acc] + val_metrics)
Notebook Image

Our model reaches an accuracy of around 55%, and by looking at the graph, it seems unlikely that the model will achieve an accuracy higher than 65% even after training for a long time. This suggests that we might need to use a more powerful model to catpure the relationship between the images and the labels more accurately. This can be done by adding more convolutional layers to our model, or incrasing the no. of channels in each convolutional layer.

We can also plot the training and validation losses to study the trend.

In [69]:
def plot_losses(train_losses, val_losses):
    plt.plot(train_losses, '-x')
    plt.plot(val_losses, '-o')
    plt.legend(['Training', 'Validation'])
    plt.title('Loss vs. No. of epochs');
In [70]:
plot_losses([None]+train_losses, [val_loss]+val_losses)
Notebook Image

Both the training and validation losses seem to decrease over time. However, if you train the model for long enough, you will notice that the training loss continues to decrease, while the validation loss stops decreasing, and even starts to increase after a certain point!

This phenomenon is called overfitting, and it is the no. 1 why many machine learning models give rather terrible results on real-world data. It happens because the model, in an attempt to minimize the loss, starts to learn patters are are unique to the training data, sometimes even memorizing specific training examples. Because of this, the model does not generalize well to previously unseen data.

Following are some common stragegies for avoiding overfitting:

  • Gathering and generating more training data, or adding noise to it
  • Using regularization techniques like batch normalization & dropout
  • Early stopping of model's training, when validation loss starts to increase

We will cover these topics in more detail in the next tutorial in this series, and learn how we can reach an accuracy of over 90% by making minor but important changes to our model.

Before continuing, let us save our work to the cloud using jovian.commit.

In [ ]:
[jovian] Saving notebook..

When you try different experiments (by chaging the learning rate, batch size, optimizer etc.) and record hyperparameters and metrics with each version of your notebook, you can use the Compare view on the project page to analyze which approaches are working well and which ones aren't. You sort/filter by accuracy, loss etc., add notes for each version and even invite collaborators to contribute to your project with their own experiments.

Testing with individual images

While we have been tracking the overall accuracy of a model so far, it's also a good idea to look at model's results on some sample images. Let's test out our model with some images from the predefined test dataset of 10000 images. We begin by creating a test dataset using the ImageFolder class.

In [ ]:
test_dataset = ImageFolder(data_dir+'/test', transform=ToTensor())

Let's define a helper function predict_image, which returns the predicted label for a single image tensor.

In [ ]:
def predict_image(img, model):
    # Convert to a batch of 1
    xb = img.unsqueeze(0)
    # Get predictions from model
    yb = model(
    # Pick index with highest probability
    _, preds  = torch.max(yb, dim=1)
    # Retrieve the class label
    return dataset.classes[preds[0].item()]
In [ ]:
img, label = test_dataset[0]
plt.imshow(img.permute(1, 2, 0))
print('Label:', dataset.classes[label], ', Predicted:', predict_image(img, model))
In [ ]:
img, label = test_dataset[1002]
plt.imshow(img.permute(1, 2, 0))
print('Label:', dataset.classes[label], ', Predicted:', predict_image(img, model))
In [ ]:
img, label = test_dataset[6153]
plt.imshow(img.permute(1, 2, 0))
print('Label:', dataset.classes[label], ', Predicted:', predict_image(img, model))

Identifying where our model performs poorly can help us improve the model, by collecting more training data, increasing/decreasing the complexity of the model, and changing the hypeparameters.

As a final step, let's also look at the overall loss and accuracy of the model on the test set, and record using jovian. We expect these values to be similar to those for the validation set. If not, we might need a better validation set that has similar data and distribution as the test set (which often comes from real world data).

In [ ]:
test_loader = DataLoader(test_dataset, batch_size)

test_loss, _, test_acc = evaluate(model, F.cross_entropy, test_loader, metric=accuracy)
print('Loss: {:.4f}, Accuracy: {:.4f}'.format(test_loss, test_acc))
In [ ]:
    'test_loss': 1.2615,
    'test_acc': 0.5463

Saving and loading the model

Since we've trained our model for a long time and achieved a resonable accuracy, it would be a good idea to save the weights of the model to disk, so that we can reuse the model later and avoid retraining from scratch. Here's how you can save the model.

In [ ]:, 'cifar10-cnn.pth')

The .state_dict method returns an OrderedDict containing all the weights and bias matrices mapped to the right attributes of the model. To load the model weights, we can redefine the model with the same structure, and use the .load_state_dict method.

In [ ]:
model2 = nn.Sequential(
    nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 16 x 16
    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 8 x 8

    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 4 x 4
    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 2 x 2

    nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2), # output: bs x 16 x 1 x 1,
    nn.Flatten(), # output: bs x 16
    nn.Linear(16, 10) # output: bs x 10 
In [ ]:

Just as a sanity check, let's verify that this model has the same loss and accuracy on the test set as before.

In [ ]:
test_loss, _, test_acc = evaluate(model2, F.cross_entropy, test_loader, metric=accuracy)
print('Loss: {:.4f}, Accuracy: {:.4f}'.format(test_loss, test_acc))

Let's make one final commit using jovian, but this time, we will also attach the weights file as an output of our experiment, for future reference and sharing.

In [ ]:

Check out the Files tab on the project page to view or download the trained model weights. You can also download all the files together using the Download Zip option in the Clone dropdown.

Data science work is often fragmented across many different platforms (Git for code, Dropbox/S3 for datasets & artifacts, spreadsheets for hyperparameters, metrics etc.) which can make it difficult to share and reproduce experiments. solves this by capturing everyting related to a data science project on a single platform, while providing a seamless workflow for capturing, sharing and reproducting your work. To learn what you can do with, check out the docs:

Summary and Further Reading/Exercises

We've covered a lot of ground in this tutorial. Here's quick recap of the topics:

  • Introduction to the CIFAR10 dataset for image classification
  • Downloading, extracing and loading an image dataset using torchvision
  • Show random batches of images in a grid using torchvision.utils.make_grid
  • Creating a convolutional neural network using with nn.Conv2d and nn.MaxPool2d layers
  • Capturing dataset information, metrics and hyperparameters using the jovian library
  • Training a convolutional neural network and visualizing the losses and errors
  • Understanding overfitting and the strategies for avoiding it (more on this later)
  • Generating predictions on single images from the test set
  • Saving and loading the model weights, and attaching them to the eperiment snaptshot using jovian

There's a lot of scope to experiment here, and I encourage you to use the interactive nature of Jupyter to play around with the various parameters. Here are a few ideas:

  • Try chaging the hyperparameters to achieve a higher accuracy within fewer epochs. You use the comparison table on the project page to compare your experiments.
  • Try adding more convolutional layers, or increasing the number of channels in each convolutional layer
  • Try using a feedforward neural network and see what's the maximum accuracy you can achieve
  • Read about some of the startegies mentioned above for reducing overfitting and achieving better results, and try to implement them by looking into the PyTorch docs.
  • Modify this notebook to train a model for a different dataset (e.g. CIFAR100 or ImageNet)

In the next tutorial, we will continue to improve our model's accuracy using techniques like data augmentation, batch normalization and dropout. We will also learn about residual networks (or ResNets), a small but critical change to the model architecture that will significantly boost the performance of our model. Stay tuned!