Learn practical skills, build real-world projects, and advance your career

Define the Convolutional Neural Network

After you've looked at the data you're working with and, in this case, know the shapes of the images and of the keypoints, you are ready to define a convolutional neural network that can learn from this data.

In this notebook and in models.py, you will:

  1. Define a CNN with images as input and keypoints as output
  2. Construct the transformed FaceKeypointsDataset, just as before
  3. Train the CNN on the training data, tracking loss
  4. See how the trained model performs on test data
  5. If necessary, modify the CNN structure and model hyperparameters, so that it performs well *

* What does well mean?

"Well" means that the model's loss decreases during training and, when applied to test image data, the model produces keypoints that closely match the true keypoints of each face. And you'll see examples of this later in the notebook.


CNN Architecture

Recall that CNN's are defined by a few types of layers:

  • Convolutional layers
  • Maxpooling layers
  • Fully-connected layers

You are required to use the above layers and encouraged to add multiple convolutional layers and things like dropout layers that may prevent overfitting. You are also encouraged to look at literature on keypoint detection, such as this paper, to help you determine the structure of your network.

TODO: Define your model in the provided file models.py file

This file is mostly empty but contains the expected name and some TODO's for creating your model.


PyTorch Neural Nets

To define a neural network in PyTorch, you define the layers of a model in the function __init__ and define the feedforward behavior of a network that employs those initialized layers in the function forward, which takes in an input image tensor, x. The structure of this Net class is shown below and left for you to fill in.

Note: During training, PyTorch will be able to perform backpropagation by keeping track of the network's feedforward behavior and using autograd to calculate the update to the weights in the network.

Define the Layers in __init__

As a reminder, a conv/pool layer may be defined like this (in __init__):

# 1 input image channel (for grayscale images), 32 output channels/feature maps, 3x3 square convolution kernel
self.conv1 = nn.Conv2d(1, 32, 3)

# maxpool that uses a square window of kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)      
Refer to Layers in forward

Then referred to in the forward function like this, in which the conv1 layer has a ReLu activation applied to it before maxpooling is applied:

x = self.pool(F.relu(self.conv1(x)))

Best practice is to place any layers whose weights will change during the training process in __init__ and refer to them in the forward function; any layers or functions that always behave in the same way, such as a pre-defined activation function, should appear only in the forward function.

Why models.py

You are tasked with defining the network in the models.py file so that any models you define can be saved and loaded by name in different notebooks in this project directory. For example, by defining a CNN class called Net in models.py, you can then create that same architecture in this and other notebooks by simply importing the class and instantiating a model:

    from models import Net
    net = Net()
# import the usual resources
import matplotlib.pyplot as plt
import numpy as np

# watch for any changes in model.py, if it changes, re-load it automatically
%load_ext autoreload
%autoreload 2