This post is the fifth in a series of tutorials on building deep learning models with PyTorch, an open source neural networks library. Check out the full series:
In the previous tutorial, we trained a feedfoward neural networks with a single hidden layer to classify handwritten digits from the MNIST dataset with over 97% accuracy. For this tutorial, we'll use the CIFAR10 dataset, which consists of 60000 32x32 px colour images in 10 classes. Here are some sample images from the dataset:
This notebook is hosted on Jovian.ml, a platform for sharing data science projects. If you want to follow along and run the code as you read, you can choose the "Run on Kaggle" option from the "Run" dropdown above. Otherwise, to run the code on your machine, you can clone the notebook, install the required dependencies using conda, and start Juptyer by running the following commands:
pip install jovian --upgrade # Install the jovian library
jovian clone 05-cifar10-cnn # Download notebook & dependencies
cd 05-cifar10-cnn # Enter the created directory
conda env update # Install the dependencies
conda activate 05-cifar10-cnn # Activate virtual env
jupyter notebook # Start Jupyter
On older versions of conda, you might need to run source activate 05-cifar10-cnn
to activate the environment. For a more detailed explanation of the above steps, check out the System setup section in the first notebook.
Before you start executing the code below, you may want to clear the cell outputs by selecting "Kernel > Restart and Clear Output" from the Jupyter notebook menu bar, to avoid confusion.
We'll download the images in PNG format from this page, using some helper functions from the torchvision
and tarfile
packages.
import os
import torch
import torchvision
import tarfile
from torchvision.datasets.utils import download_url
# Dowload the dataset
dataset_url = "http://files.fast.ai/data/cifar10.tgz"
download_url(dataset_url, '.')
Using downloaded and verified file: ./cifar10.tgz
# Extract from archive
with tarfile.open('./cifar10.tgz', 'r:gz') as tar:
tar.extractall(path='./data')
The dataset is extracted to the directory data/cifar10
. It contains 2 folders train
and test
, containing the training set (50000 images) and test set (10000 images) respectively. Each of them contains 10 folders, one for each class of images. Let's verify this using os.listdir
.
data_dir = './data/cifar10'
print(os.listdir(data_dir))
classes = os.listdir(data_dir + "/train")
print(classes)
['test', 'labels.txt', 'train']
['automobile', 'airplane', 'ship', 'deer', 'frog', 'horse', 'cat', 'truck', 'dog', 'bird']
Let's look inside a couple of folders, one from the training set and another from the test set. As an exercise, you can verify that that there are an equal number of images for each class, 5000 in the training set and 1000 in the test set.
airplane_files = os.listdir(data_dir + "/train/airplane")
print('No. of training examples for airplanes:', len(airplane_files))
print(airplane_files[:5])
No. of training examples for airplanes: 5000
['16182_airplane.png', '21012_airplane.png', '22622_airplane.png', '14479_airplane.png', '22258_airplane.png']
ship_test_files = os.listdir(data_dir + "/test/ship")
print("No. of test examples for ship:", len(ship_test_files))
print(ship_test_files[:5])
No. of test examples for ship: 1000
['5500_ship.png', '4348_ship.png', '8679_ship.png', '469_ship.png', '8530_ship.png']
The above directory structure (one folder per class) is used by many computer vision datasets, and most deep learning libraries provide utilites for working with such datasets. We can use the ImageFolder
class from torchvision
to load the data as PyTorch tensors.
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
dataset = ImageFolder(data_dir+'/train', transform=ToTensor())
Let's look at a sample element from the training dataset. Each element is a tuple, containing a image tensor and a label. Since the data consists of 32x32 px color images with 3 channels (RGB), each image tensor has the shape (3, 32, 32)
.
img, label = dataset[0]
print(img.shape, label)
img
torch.Size([3, 32, 32]) 0
tensor([[[0.8588, 0.8588, 0.8627, ..., 0.8510, 0.8471, 0.8392],
[0.8667, 0.8667, 0.8745, ..., 0.8588, 0.8549, 0.8471],
[0.8667, 0.8667, 0.8745, ..., 0.8588, 0.8549, 0.8471],
...,
[0.8980, 0.9020, 0.9098, ..., 0.8980, 0.8902, 0.8863],
[0.8471, 0.8549, 0.8706, ..., 0.8980, 0.8902, 0.8824],
[0.7608, 0.7490, 0.7725, ..., 0.8980, 0.8902, 0.8824]],
[[0.9333, 0.9333, 0.9373, ..., 0.9176, 0.9137, 0.9059],
[0.9412, 0.9412, 0.9490, ..., 0.9294, 0.9216, 0.9137],
[0.9412, 0.9412, 0.9490, ..., 0.9255, 0.9216, 0.9137],
...,
[0.9608, 0.9569, 0.9569, ..., 0.9412, 0.9412, 0.9412],
[0.9020, 0.9098, 0.9255, ..., 0.9412, 0.9412, 0.9373],
[0.8157, 0.8039, 0.8275, ..., 0.9412, 0.9412, 0.9373]],
[[0.9608, 0.9608, 0.9647, ..., 0.9490, 0.9412, 0.9412],
[0.9686, 0.9686, 0.9765, ..., 0.9608, 0.9529, 0.9490],
[0.9686, 0.9686, 0.9765, ..., 0.9569, 0.9529, 0.9490],
...,
[0.9804, 0.9765, 0.9804, ..., 0.9647, 0.9647, 0.9608],
[0.9176, 0.9255, 0.9373, ..., 0.9686, 0.9647, 0.9608],
[0.8275, 0.8157, 0.8392, ..., 0.9686, 0.9647, 0.9608]]])
The list of classes is stored in the .classes
property of the dataset. The numeric label for each element corresponds to index of the element's label in the list of classes.
print(dataset.classes)
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
We can view the image using matplotlib
, but we need to change the tensor dimensions to (32,32,3)
. Let's create a helper function to display an image and its label.
import matplotlib.pyplot as plt
def show_example(img, label):
print('Label: ', dataset.classes[label], "("+str(label)+")")
plt.imshow(img.permute(1, 2, 0))
Let's look at a couple of images from the dataset. As you can tell, the 32x32px images are quite difficult to identify, even for the human eye. Try changing the indices below to view different images.
show_example(*dataset[0])
Label: airplane (0)
show_example(*dataset[1099])
Label: airplane (0)
Before continuing, let's save our work using the jovian
python library.
!pip install jovian --upgrade -q
import jovian
jovian.commit()
[jovian] Saving notebook..
[jovian] Creating a new notebook on https://jovian.ml/
[jovian] Please enter your API key ( from https://jovian.ml/ ):
API KEY: ········
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ml/souravsharma227/notebook-source
jovian.commit
uploads the notebook to your Jovian.ml account, captures the Python environment and creates a sharable link for your notebook as shown above. You can use this link to share your work and let anyone run it online or reproduce it with the jovian clone
command.
While building real world machine learning models, it is quite common to split the dataset into 3 parts:
Since there's no predefined validation set, we can set aside a small portion of the training set to be used as the validation set. Let's define a function that randomly picks a given fraction of the element indices for creating the validation set. We'll also pass a random seed into the function, so we can recreate the same training/validation split in future runs.
import numpy as np
def split_indices(n, val_pct=0.1, seed=99):
# Determine size of validation set
n_val = int(val_pct*n)
# Set the random seed (for reproducibility)
np.random.seed(seed)
# Create random permutation of 0 to n-1
idxs = np.random.permutation(n)
# Pick first n_val indices for validation set
return idxs[n_val:], idxs[:n_val]
val_pct = 0.2
rand_seed = 42
train_indices, val_indices = split_indices(len(dataset), val_pct, rand_seed)
print(len(train_indices), len(val_indices))
print('Sample validation indices: ', val_indices[:10])
40000 10000
Sample validation indices: [33553 9427 199 12447 39489 42724 10822 49498 4144 36958]
The jovian
library also provides a simple API for recording important parameters related to the dataset, model training, results etc. for easy reference and comparison between multiple experiments. Let's record dataset_url
, val_pct
and rand_seed
using jovian.log_dataset
.
jovian.log_dataset({
'dataset_url': dataset_url,
'val_pct': val_pct,
'rand_seed': rand_seed
})
[jovian] Dataset logged.
We have randomly shuffled the indices, and selected a small portion ( 20% ) to serve as the validation set. To process our data in small batches, we can now create PyTorch data loaders for each of these using a SubsetRandomSampler
, which samples elements randomly from a given list of indices, while greating batches of data.
from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.data.dataloader import DataLoader
batch_size=100
# Training sampler and data loader
train_sampler = SubsetRandomSampler(train_indices)
train_dl = DataLoader(dataset,
batch_size,
sampler=train_sampler)
# Validation sampler and data loader
val_sampler = SubsetRandomSampler(val_indices)
val_dl = DataLoader(dataset,
batch_size,
sampler=val_sampler)
We can look at batches of images from the dataset using the make_grid
method from torchvision
. Each time the following code is run, we get a different bach, since the sampler shuffles the indices before creating batches.
from torchvision.utils import make_grid
def show_batch(dl):
for images, labels in dl:
fig, ax = plt.subplots(figsize=(10, 10))
ax.set_xticks([]); ax.set_yticks([])
ax.imshow(make_grid(images, 10).permute(1, 2, 0))
break
show_batch(train_dl)
Once again, let's save and commit our work using jovian
before proceeding further.
jovian.commit()
[jovian] Saving notebook..
[jovian] Updating notebook "da8b4f74853f4f3ca7b6fb80c10c5ffa" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Recording metrics, hyperparameters, datasets & git information..
[jovian] Committed successfully! https://jovian.ml/souravsharma227/notebook-source
After the first commit, all subsequent commits record a new version of the notebook within the same Jovian project. You can use jovian.commit
to version Jupyter notebooks (instead of doing File > Save As
), and keep your data science projects organized. Also check out the Records tab on the project page to see how the information logged using jovian.log_dataset
appears on the UI.
In our previous tutorial, we defined a deep neural network with fully-connected layers using nn.Linear
. For this tutorial however, we will use a convolutional neural network, using the nn.Conv2d
class from PyTorch.
The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights. This kernel âslidesâ over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel. - Source
I highly recommend checking out the following articles if want to gain a better understanding of convolutions:
There are certain advantages offered by convolutional layers when working with image data:
We will also use a max-pooling layers to progressively decrease the height & width of the output tensors from each convolutional layer.
Before we define the entire model, let's look at how a single convolutional layer followed by a max-pooling layer operates on the data.
import torch.nn as nn
import torch.nn.functional as F
simple_model = nn.Sequential(
nn.Conv2d(3, 8, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(2, 2)
)
Refer to Sylvian's post for an explanation of kernel_size
, stride
and padding
.
for images, labels in train_dl:
print('images.shape:', images.shape)
out = simple_model(images)
print('out.shape:', out.shape)
break
images.shape: torch.Size([100, 3, 32, 32])
out.shape: torch.Size([100, 8, 16, 16])
The Conv2d
layer transforms a 3-channel image to a 16-channel feature map, and the MaxPool2d
layer halves the height and width. The feature map gets smaller as we add more layers, until we are finally left with a n x 1 x 1
feature map (where n
is the no. of channels), which can be flattened into a vector. We can then add a fully connected layer at the end to get vector of size 10 for each image.
model = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 16 x 16
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 8 x 8
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 4 x 4
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 2 x 2
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 1 x 1,
nn.Flatten(), # output: bs x 16
nn.Linear(16, 10) # output: bs x 10
)
Let's verify that the model produces the expected output on a batch of training data. The 10 outputs for each image can be interpreted as probabilities for the 10 target classes (after applying softmax), and the class with the highest probability is chosen as the label predicted by the model for the input image. Check out Part 3 (logistic regression) for a more detailed discussion on interpeting the outputs, applying softmax and identifying the predicted labels.
for images, labels in train_dl:
print('images.shape:', images.shape)
out = model(images)
print('out.shape:', out.shape)
print('out[0]:', out[0])
break
images.shape: torch.Size([100, 3, 32, 32])
out.shape: torch.Size([100, 10])
out[0]: tensor([ 0.2117, 0.0233, -0.0122, 0.2238, -0.0231, 0.1669, -0.0979, 0.2306,
-0.0510, -0.1141], grad_fn=<SelectBackward>)
To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device
& to_device
) and a helper class DeviceDataLoader
to move our model & data to the GPU as required. These are described in more detail in the previous tutorial.
def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device
def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)
Based on where you're running this notebook, your default device could be a CPU (torch.device('cpu')
) or a GPU (torch.device('cuda')
)
device = get_default_device()
device
device(type='cuda')
We can now wrap our training and validation data loaders using DeviceDataLoader
for automatically transferring batches of data to the GPU (if available), and use to_device
to move our model to the GPU (if available).
train_dl = DeviceDataLoader(train_dl, device)
valid_dl = DeviceDataLoader(train_dl, device)
to_device(model, device)
Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU()
(8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(9): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(10): ReLU()
(11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(12): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU()
(14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(15): Flatten()
(16): Linear(in_features=16, out_features=10, bias=True)
)
Once again, let's save and commit the notebook before we proceed further.
jovian.commit()
[jovian] Saving notebook..
[jovian] Updating notebook "da8b4f74853f4f3ca7b6fb80c10c5ffa" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Recording metrics, hyperparameters, datasets & git information..
[jovian] Committed successfully! https://jovian.ml/souravsharma227/notebook-source
As in the previous tutorials, we can use cross entropy as the loss function and accuracy as the evaluation metric for our model. The training loop is also identical, so we can reuse the loss_batch
, evaluate
and fit
functions from the previous tutorial.
The loss_batch
function calculates the loss and metric value for a batch of data, and optionally performs gradient descent if an optimizer is provided.
def loss_batch(model, loss_func, xb, yb, opt=None, metric=None):
# Generate predictions
preds = model(xb)
# Calculate loss
loss = loss_func(preds, yb)
if opt is not None:
# Compute gradients
loss.backward()
# Update parameters
opt.step()
# Reset gradients
opt.zero_grad()
metric_result = None
if metric is not None:
# Compute the metric
metric_result = metric(preds, yb)
return loss.item(), len(xb), metric_result
The evaluate
function calculates the overall loss (and a metric, if provided) for the validation set.
def evaluate(model, loss_fn, valid_dl, metric=None):
with torch.no_grad():
# Pass each batch through the model
results = [loss_batch(model, loss_fn, xb, yb, metric=metric)
for xb,yb in valid_dl]
# Separate losses, counts and metrics
losses, nums, metrics = zip(*results)
# Total size of the dataset
total = np.sum(nums)
# Avg. loss across batches
avg_loss = np.sum(np.multiply(losses, nums)) / total
avg_metric = None
if metric is not None:
# Avg. of metric across batches
avg_metric = np.sum(np.multiply(metrics, nums)) / total
return avg_loss, total, avg_metric
The fit
function (from the previous tutorial) contains the actual training loop: it sets up an optimizer, trains the model using the training set, then evaluates it on the validation set, then logs the losses, metrics etc. and repeats the process for the given number of epochs.
There's one important addition though: We invoke model.train()
before training the model and model.eval()
before evaluating it on the validation set. We'll discover what these methods do when we discuss regularization.
def fit(epochs, model, loss_fn, train_dl, valid_dl,
opt_fn=None, lr=None, metric=None):
train_losses, val_losses, val_metrics = [], [], []
# Instantiate the optimizer
if opt_fn is None: opt_fn = torch.optim.SGD
opt = opt_fn(model.parameters(), lr=lr)
for epoch in range(epochs):
# Training
model.train()
for xb,yb in train_dl:
train_loss,_,_ = loss_batch(model, loss_fn, xb, yb, opt)
# Evaluation
model.eval()
result = evaluate(model, loss_fn, valid_dl, metric)
val_loss, total, val_metric = result
# Record the loss & metric
train_losses.append(train_loss)
val_losses.append(val_loss)
val_metrics.append(val_metric)
# Print progress
if metric is None:
print('Epoch [{}/{}], train_loss: {:4f}, val_loss: {:.4f}'
.format(epoch+1, epochs, train_loss, val_loss))
else:
print('Epoch [{}/{}], train_loss: {:.4f}, val_loss: {:.4f}, val_{}: {:.4f}'
.format(epoch+1, epochs, train_loss, val_loss,
metric.__name__, val_metric))
return train_losses, val_losses, val_metrics
We also define an accuracy
function which calculates the overall accuracy of the model on an entire batch of outputs, so that we can use it as a metric in fit
.
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.sum(preds == labels).item() / len(preds)
Before we train the model, let's see how it performs on the validation set with the initial set of parameters.
val_loss, _, val_acc = evaluate(model, F.cross_entropy,
valid_dl, metric=accuracy)
print('Loss: {:.4f}, Accuracy: {:.4f}'.format(val_loss, val_acc))
Loss: 2.3113, Accuracy: 0.1006
The initial accuracy is around 10%, which is what one might expect from a randomly intialized model (since it has a 1 in 10 chance of getting a label right by guessing randomly).
We'll use the following hyperparmeters (learning rate, no. of epochs, batch_size etc.) to train our model. As an exercise, you can try changing these to see if you have achieve a higher accuracy in a shorter time.
num_epochs = 10
opt_fn = torch.optim.Adam
lr = 0.005
It's important to record the hyperparameters of every experiment you do, to replicate it later and compare it against other experiments. We can record them using jovian.log_hyperparams
.
jovian.log_hyperparams({
'num_epochs': num_epochs,
'opt_fn': opt_fn.__name__,
'batch_size': batch_size,
'lr': lr,
})
[jovian] Hyperparameters logged.
history = fit(num_epochs, model, F.cross_entropy,
train_dl, valid_dl, opt_fn, lr, accuracy)
train_losses, val_losses, val_metrics = history
Epoch [1/10], train_loss: 1.6818, val_loss: 1.7273, val_accuracy: 0.3505
Epoch [2/10], train_loss: 1.5016, val_loss: 1.5496, val_accuracy: 0.4368
Epoch [3/10], train_loss: 1.5561, val_loss: 1.3837, val_accuracy: 0.4864
Epoch [4/10], train_loss: 1.4047, val_loss: 1.3397, val_accuracy: 0.5129
Epoch [5/10], train_loss: 1.3220, val_loss: 1.2852, val_accuracy: 0.5303
Epoch [6/10], train_loss: 1.2815, val_loss: 1.2370, val_accuracy: 0.5529
Epoch [7/10], train_loss: 1.2674, val_loss: 1.2120, val_accuracy: 0.5655
Epoch [8/10], train_loss: 1.2578, val_loss: 1.1988, val_accuracy: 0.5720
Epoch [9/10], train_loss: 1.2659, val_loss: 1.1685, val_accuracy: 0.5822
Epoch [10/10], train_loss: 1.2738, val_loss: 1.1877, val_accuracy: 0.5707
Just as we have recorded the hyperparameters, we can also record the final metrics achieved by the model using jovian.log_metrics
for reference, analysis and comparison.
jovian.log_metrics({
'train_loss': 1.2738,
'val_loss': 1.1877,
'val_accuracy': 0.5707
})
[jovian] Metrics logged.
We can also plot the valdation set accuracies to study how the model improves over time.
def plot_metric(metric_values):
"""Plot metric values in a line graph"""
plt.plot(metric_values, '-x')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.title('Accuracy vs. No. of epochs');
plot_metric([val_acc] + val_metrics)
Our model reaches an accuracy of around 55%, and by looking at the graph, it seems unlikely that the model will achieve an accuracy higher than 65% even after training for a long time. This suggests that we might need to use a more powerful model to catpure the relationship between the images and the labels more accurately. This can be done by adding more convolutional layers to our model, or incrasing the no. of channels in each convolutional layer.
We can also plot the training and validation losses to study the trend.
def plot_losses(train_losses, val_losses):
plt.plot(train_losses, '-x')
plt.plot(val_losses, '-o')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend(['Training', 'Validation'])
plt.title('Loss vs. No. of epochs');
plot_losses([None]+train_losses, [val_loss]+val_losses)
Both the training and validation losses seem to decrease over time. However, if you train the model for long enough, you will notice that the training loss continues to decrease, while the validation loss stops decreasing, and even starts to increase after a certain point!
This phenomenon is called overfitting, and it is the no. 1 why many machine learning models give rather terrible results on real-world data. It happens because the model, in an attempt to minimize the loss, starts to learn patters are are unique to the training data, sometimes even memorizing specific training examples. Because of this, the model does not generalize well to previously unseen data.
Following are some common stragegies for avoiding overfitting:
We will cover these topics in more detail in the next tutorial in this series, and learn how we can reach an accuracy of over 90% by making minor but important changes to our model.
Before continuing, let us save our work to the cloud using jovian.commit
.
jovian.commit()
[jovian] Saving notebook..
When you try different experiments (by chaging the learning rate, batch size, optimizer etc.) and record hyperparameters and metrics with each version of your notebook, you can use the Compare view on the project page to analyze which approaches are working well and which ones aren't. You sort/filter by accuracy, loss etc., add notes for each version and even invite collaborators to contribute to your project with their own experiments.
While we have been tracking the overall accuracy of a model so far, it's also a good idea to look at model's results on some sample images. Let's test out our model with some images from the predefined test dataset of 10000 images. We begin by creating a test dataset using the ImageFolder
class.
test_dataset = ImageFolder(data_dir+'/test', transform=ToTensor())
Let's define a helper function predict_image
, which returns the predicted label for a single image tensor.
def predict_image(img, model):
# Convert to a batch of 1
xb = img.unsqueeze(0)
# Get predictions from model
yb = model(xb)
# Pick index with highest probability
_, preds = torch.max(yb, dim=1)
# Retrieve the class label
return dataset.classes[preds[0].item()]
img, label = test_dataset[0]
plt.imshow(img.permute(1, 2, 0))
print('Label:', dataset.classes[label], ', Predicted:', predict_image(img, model))
img, label = test_dataset[1002]
plt.imshow(img.permute(1, 2, 0))
print('Label:', dataset.classes[label], ', Predicted:', predict_image(img, model))
img, label = test_dataset[6153]
plt.imshow(img.permute(1, 2, 0))
print('Label:', dataset.classes[label], ', Predicted:', predict_image(img, model))
Identifying where our model performs poorly can help us improve the model, by collecting more training data, increasing/decreasing the complexity of the model, and changing the hypeparameters.
As a final step, let's also look at the overall loss and accuracy of the model on the test set, and record using jovian
. We expect these values to be similar to those for the validation set. If not, we might need a better validation set that has similar data and distribution as the test set (which often comes from real world data).
test_loader = DataLoader(test_dataset, batch_size)
test_loss, _, test_acc = evaluate(model, F.cross_entropy, test_loader, metric=accuracy)
print('Loss: {:.4f}, Accuracy: {:.4f}'.format(test_loss, test_acc))
jovian.log_metrics({
'test_loss': 1.2615,
'test_acc': 0.5463
})
Since we've trained our model for a long time and achieved a resonable accuracy, it would be a good idea to save the weights of the model to disk, so that we can reuse the model later and avoid retraining from scratch. Here's how you can save the model.
torch.save(model.state_dict(), 'cifar10-cnn.pth')
The .state_dict
method returns an OrderedDict
containing all the weights and bias matrices mapped to the right attributes of the model. To load the model weights, we can redefine the model with the same structure, and use the .load_state_dict
method.
model2 = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 16 x 16
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 8 x 8
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 4 x 4
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 2 x 2
nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: bs x 16 x 1 x 1,
nn.Flatten(), # output: bs x 16
nn.Linear(16, 10) # output: bs x 10
)
model2.load_state_dict(torch.load('cifar10-cnn.pth'))
Just as a sanity check, let's verify that this model has the same loss and accuracy on the test set as before.
test_loss, _, test_acc = evaluate(model2, F.cross_entropy, test_loader, metric=accuracy)
print('Loss: {:.4f}, Accuracy: {:.4f}'.format(test_loss, test_acc))
Let's make one final commit using jovian
, but this time, we will also attach the weights file as an output of our experiment, for future reference and sharing.
jovian.commit(artifacts=['cifar10-cnn.pth'])
Check out the Files tab on the project page to view or download the trained model weights. You can also download all the files together using the Download Zip option in the Clone dropdown.
Data science work is often fragmented across many different platforms (Git for code, Dropbox/S3 for datasets & artifacts, spreadsheets for hyperparameters, metrics etc.) which can make it difficult to share and reproduce experiments. Jovian.ml solves this by capturing everyting related to a data science project on a single platform, while providing a seamless workflow for capturing, sharing and reproducting your work. To learn what you can do with Jovian.ml, check out the docs: https://docs.jovian.ml.
We've covered a lot of ground in this tutorial. Here's quick recap of the topics:
torchvision
torchvision.utils.make_grid
nn.Conv2d
and nn.MaxPool2d
layersjovian
libraryjovian
There's a lot of scope to experiment here, and I encourage you to use the interactive nature of Jupyter to play around with the various parameters. Here are a few ideas:
In the next tutorial, we will continue to improve our model's accuracy using techniques like data augmentation, batch normalization and dropout. We will also learn about residual networks (or ResNets), a small but critical change to the model architecture that will significantly boost the performance of our model. Stay tuned!