Jovian
⭐️
Sign In

Training a Convolutional Neural Network to Recognize Facial Expressions- Zero to Gans Course Project

For my course project I have decided to train a cnn to differentiate facial expressions using a dataset containing six different classes of expressions.
I found this dataset on kaggle, and it represents an image classification problem; since the data consists of six classes with each class containing between ~3000-~7000 images.**

Importing the libraries and getting the dataset:

In [3]:
!pip install jovian --upgrade -q
In [4]:
!pip install torch===1.7.1+cu110 torchvision===0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html Collecting torch===1.7.1+cu110 Using cached https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp39-cp39-win_amd64.whl (2050.2 MB) Requirement already satisfied: numpy in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from torch===1.7.1+cu110) (1.19.5) Collecting torchaudio===0.7.2 Downloading https://download.pytorch.org/whl/torchaudio-0.7.2-cp39-none-win_amd64.whl (103 kB) Collecting torchvision===0.8.2+cu110 Downloading https://download.pytorch.org/whl/cu110/torchvision-0.8.2%2Bcu110-cp39-cp39-win_amd64.whl (1.6 MB) Requirement already satisfied: pillow>=4.1.1 in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from torchvision===0.8.2+cu110) (8.1.0) Collecting typing-extensions Downloading typing_extensions-3.7.4.3-py3-none-any.whl (22 kB) Installing collected packages: typing-extensions, torch, torchvision, torchaudio Successfully installed torch-1.7.1+cu110 torchaudio-0.7.2 torchvision-0.8.2+cu110 typing-extensions-3.7.4.3
In [7]:
!pip install seaborn
Collecting seaborn Downloading seaborn-0.11.1-py3-none-any.whl (285 kB) Requirement already satisfied: numpy>=1.15 in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (1.19.5) Requirement already satisfied: matplotlib>=2.2 in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (3.3.3) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (1.3.1) Requirement already satisfied: cycler>=0.10 in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in c:\users\geralt\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (8.1.0) Requirement already satisfied: python-dateutil>=2.1 in c:\users\geralt\appdata\roaming\python\python39\site-packages (from matplotlib>=2.2->seaborn) (2.8.1) Requirement already satisfied: six in c:\users\geralt\appdata\roaming\python\python39\site-packages (from cycler>=0.10->matplotlib>=2.2->seaborn) (1.15.0) Collecting pandas>=0.23 Downloading pandas-1.2.0-cp39-cp39-win_amd64.whl (9.3 MB) Collecting pytz>=2017.3 Downloading pytz-2020.5-py2.py3-none-any.whl (510 kB) Collecting scipy>=1.0 Downloading scipy-1.6.0-cp39-cp39-win_amd64.whl (32.7 MB) Installing collected packages: pytz, scipy, pandas, seaborn Successfully installed pandas-1.2.0 pytz-2020.5 scipy-1.6.0 seaborn-0.11.1
In [8]:
import jovian
import os
import numpy as np
import torch
import torchvision
import tarfile
import matplotlib
import matplotlib.pyplot as plt
import torchvision.transforms as tt
import torch.nn as nn
import torch.nn.functional as F
import seaborn as sns

from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
from torchvision.datasets.utils import download_url
from torch.utils.data import random_split
from torch.utils.data.dataloader import DataLoader
from torchvision.utils import make_grid

%matplotlib inline


matplotlib.rcParams['figure.facecolor'] = '#ffffff'
In [9]:
project_name='zero-to-gans-course-project2'
In [10]:
dataset= 'faces_dataset'
dataset_url='faces_dataset'
path = 'C:\\Users\\Geralt\\Desktop\\Deeplearning\\zero-to-gans-course-project3\\faces_dataset'

Cleaning the data and preparing the model:

there wasn't much extra cleaning to do with this dataset since all of the images were already sized and sorted correctly below I look through the data, and perform some normalization:
In [11]:
#checking for cuda drivers if available
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
    
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)
In [12]:
device = get_default_device()
device
Out[12]:
device(type='cuda')
In [13]:
print(os.listdir(dataset))
classes = os.listdir(dataset + "/Training")
print(classes)
['Testing', 'Training'] ['Angry', 'Fear', 'Happy', 'Neutral', 'Sad', 'Suprise']
In [14]:
data_dir = path #dividing the data into seperate directories

train_dir = data_dir + '/Training'
test_dir = data_dir + '/Testing'
In [15]:
count = []
for folder in classes:
    num_images = len(os.listdir(train_dir+'/'+ folder))
    count.append(num_images)
    print(f'There are {num_images} images in the {folder} category.')
There are 3995 images in the Angry category. There are 4097 images in the Fear category. There are 7215 images in the Happy category. There are 4965 images in the Neutral category. There are 4830 images in the Sad category. There are 3171 images in the Suprise category.
In [16]:

plt.figure(figsize=(12, 6))
sns.barplot(x=classes, y=count)
plt.title('imgs per category', size=16)
plt.ylabel('Imgs', size=14)
plt.xlabel('Categories', size=14)
plt.show;
Notebook Image

stats = ((0.4300, 0.4571, 0.4533), (0.2581, 0.2563, 0.2886)) train_tfms = tt.Compose([tt.Resize((128, 128)), tt.RandomCrop(38, padding=4, padding_mode='reflect'), tt.RandomHorizontalFlip(), tt.ToTensor(), tt.Normalize(*stats,inplace=True)]) valid_tfms = tt.Compose([tt.Resize((128, 128)), tt.ToTensor(), tt.Normalize(*stats)])

Data transforms (normalization & data augmentation)

stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) train_tfms = tt.Compose([ tt.RandomCrop(32, padding=4, padding_mode='reflect'), tt.RandomHorizontalFlip(), tt.RandomRotate #tt.RandomResizedCrop(256, scale=(0.5,0.9), ratio=(1, 1)), tt.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1), tt.ToTensor(), tt.Normalize(*stats,inplace=True)]) valid_tfms = tt.Compose([tt.ToTensor(), tt.Normalize(*stats)])

In [49]:
def denormalize(images, means, stds):
    means = torch.tensor(means).reshape(1, 3, 1, 1)
    stds = torch.tensor(stds).reshape(1, 3, 1, 1)
    return images * stds + means
In [64]:
# Data transforms (normalization & data augmentation)
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
 
train_tfms = tt.Compose([
                         tt.Resize(64),
                         tt.RandomCrop(50, padding=4, padding_mode='reflect'), 
                         tt.RandomHorizontalFlip(), 
                         #tt.RandomRotate,
                         #tt.RandomResizedCrop(256, scale=(0.5,0.9), ratio=(1, 1)), 
                         #tt.ColorJitter(brightness=.1, contrast=.8, saturation=0.5, hue=0.12),
                         tt.ToTensor(), 
                         tt.Normalize(*stats,inplace=True)])
valid_tfms = tt.Compose([tt.ToTensor(), tt.Normalize(*stats)])
In [65]:
train_ds = ImageFolder(train_dir, train_tfms)
test_ds = ImageFolder(test_dir, valid_tfms)
In [66]:
def show_example(img, label):
    print(f'Label: {train_ds.classes[label]} ({label})')
    print(f'image.shape: {img.shape}')
    plt.imshow(img.permute(1, 2, 0))
In [67]:
print(show_example(*train_ds[1]))
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Angry (0) image.shape: torch.Size([3, 50, 50]) None
Notebook Image
In [68]:
show_example(*train_ds[4000])
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Fear (1) image.shape: torch.Size([3, 50, 50])
Notebook Image
In [69]:
show_example(*train_ds[8927])
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Happy (2) image.shape: torch.Size([3, 50, 50])
Notebook Image
In [70]:
show_example(*train_ds[17000])
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Neutral (3) image.shape: torch.Size([3, 50, 50])
Notebook Image

Here I am defining the batch size, making the dataloaders, setting the model architecture. I am using resnet9 as the image classification base for this model:

In [71]:
batch_size= 350
In [73]:
# PyTorch data loaders
train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=6, pin_memory=True)
test_dl = DataLoader(test_ds, batch_size*2, num_workers=6, pin_memory=True)

def show_batch(dl): for images, labels in dl: fig, ax = plt.subplots(figsize=(20, 12)) ax.set_xticks([]); ax.set_yticks([]) ax.imshow(make_grid(images, nrow=8).permute(1, 2, 0)) break

show_batch(train_dl)

In [74]:
def accuracy(outputs, labels): #setting up the model architecture
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))

class ImageClassificationBase(nn.Module):
    def training_step(self, batch):
        images, labels = batch 
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels) # Calculate loss
        return loss
    
    def validation_step(self, batch):
        images, labels = batch 
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
    
    def epoch_end(self, epoch, result):
        print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
            epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))
In [75]:
def conv_block(in_channels, out_channels, pool=False):
    layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), 
              nn.BatchNorm2d(out_channels), 
              nn.ReLU(inplace=True)]
    if pool: layers.append(nn.MaxPool2d(2))
    return nn.Sequential(*layers)

class ResNet9(ImageClassificationBase): #resnet9 architecture
    def __init__(self, in_channels, num_classes):
        super().__init__()
        # channels: 1x48x48
        
        self.conv1 = conv_block(in_channels, 64) #64x48x48
        self.conv2 = conv_block(64, 128, pool=True)#128x24x24
        self.res1 = nn.Sequential(conv_block(128, 128), #takes 128 imput channels and returns a feature map of the same number 
                                  conv_block(128, 128)) #the output of these two residual blocks will stay the same as the residual block imput so the input can be fed back in
        
        self.conv3 = conv_block(128, 256, pool=True) #256x12x12
        self.conv4 = conv_block(256, 512, pool=True) #512x6x6
        self.res2 = nn.Sequential(conv_block(512, 512),
                                  conv_block(512, 512))#this is the second residual block 
        self.classifier = nn.Sequential(nn.MaxPool2d(4), #performing maxpooling to convert the blocks into the max value 512x1x1
                                        nn.Flatten(), #flatten the 3d map #512
                                        nn.Dropout(0.35), #avoiding overfitting by randomly picking 20% of the feature map elements and setting them to zero. only keeping 80% of the learned data
                                        nn.Linear(512, num_classes)) # takes the 512 outputs and converts them into the outputs (6) - lesson 2
        
    def forward(self, xb):
        out = self.conv1(xb)
        out = self.conv2(out)
        out = self.res1(out) + out #on the residual layer we take the output and add the output from conv2
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.res2(out) + out
        out = self.classifier(out)
        return out

In [76]:
model = ResNet9(3, 10)
model
Out[76]:
ResNet9(
  (conv1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (conv2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (res1): Sequential(
    (0): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (res2): Sequential(
    (0): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (classifier): Sequential(
    (0): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
    (1): Flatten(start_dim=1, end_dim=-1)
    (2): Dropout(p=0.35, inplace=False)
    (3): Linear(in_features=512, out_features=10, bias=True)
  )
)

I did some tests with my cpu running on vs code, since I don't yet have a gpu installed, other tests I ran on Kaggle to use the virtual gpu. This gave me a great appreciation for the efficiency of using cuda cores vs cpu cores:

Most of the training on cpu took between 30min-1hour+!

GPU training with the same hyperparameters took between 5-16minutes!!

In [77]:
device = get_default_device()
device
Out[77]:
device(type='cuda')
In [78]:
train_dl = DeviceDataLoader(train_dl, device)
test_dl = DeviceDataLoader(test_dl, device)
to_device(model, device);

Defining the functions for training the model:

Some info on the different hyperparameters at play:

Learning rate is very important, start small and increase learning rate until the model gets past ideal rate, then decrease the LR by small amnts to find minimum loss.

Weight decay(WD)-- prevents the weights from becoming too large by adding another aspect to the loss function: Loss= MSE(y_hat,y) + WD * sum(w^2) if the weights get too large the loss will get too high, this keeps it in a managable range.

Grad clipping limits the values of gradients to a small ranger to prevent the grad values from becoming too large. When you calculate the loss with respect to the gradient, if the gradient is larger than specificed clip amount, the gradient is clipped and replaced with the Grad clip value.

I tried modifying these parameters specifically, I was having trouble getting the model to train above ~60% accuracy:

            1. Batch size: varied from 30-500
                    Best result: 150
            2. Dropout rate: varied from 10%-50%
                    Best result: 30%
            3. Epochs: varied from 6-50
                    Best result: 20
            4. Learning rate: varied from .0001-.02
                    Best result: .02
            5. Weight Decay: varied from 1e-5....1e-3
                    Best result: 1e-4
            6. Gradient Clipping: varied from .1-.01
                    Best result: .086 but I ended up keeping it at .1 because the model wasn't setting the gradients over ~.86 except for a few instances and it didn't seem to make much of an overall difference.
In [79]:
@torch.no_grad()
def evaluate(model, val_loader):
    model.eval()
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader, 
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    torch.cuda.empty_cache()
    history = []
    
    # Set up cutom optimizer with weight decay
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    # Set up one-cycle learning rate scheduler
    sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs, 
                                                steps_per_epoch=len(train_loader))
    
    for epoch in range(epochs):
        # Training Phase 
        model.train()
        train_losses = []
        lrs = []
        for batch in train_loader:
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            
            # Gradient clipping
            if grad_clip: 
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            
            optimizer.step()
            optimizer.zero_grad()
            
            # Record & update learning rate
            lrs.append(get_lr(optimizer))
            sched.step()
        
        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        history.append(result)
    return history
In [82]:
history = [evaluate(model, test_dl)]
history
Out[82]:
[{'val_loss': 2.285583972930908, 'val_acc': 0.11909090727567673}]
In [114]:
epochs =20
max_lr = .02
grad_clip = 0.1
weight_decay = 1e-4
opt_func = torch.optim.Adam

In [115]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, test_dl, 
                             grad_clip=grad_clip, 
                             weight_decay=weight_decay, 
                             opt_func=opt_func)
Epoch [0], last_lr: 0.00206, train_loss: 0.7268, val_loss: 1.2332, val_acc: 0.5694 Epoch [1], last_lr: 0.00556, train_loss: 0.7696, val_loss: 1.2766, val_acc: 0.5708 Epoch [2], last_lr: 0.01037, train_loss: 0.8917, val_loss: 1.4493, val_acc: 0.5281 Epoch [3], last_lr: 0.01518, train_loss: 1.0192, val_loss: 1.2018, val_acc: 0.5171 Epoch [4], last_lr: 0.01871, train_loss: 1.0596, val_loss: 1.4612, val_acc: 0.4886 Epoch [5], last_lr: 0.02000, train_loss: 1.0968, val_loss: 1.4121, val_acc: 0.4888 Epoch [6], last_lr: 0.01975, train_loss: 1.1041, val_loss: 1.2290, val_acc: 0.5328 Epoch [7], last_lr: 0.01901, train_loss: 1.0949, val_loss: 1.2339, val_acc: 0.5323 Epoch [8], last_lr: 0.01782, train_loss: 1.0933, val_loss: 1.3464, val_acc: 0.4847 Epoch [9], last_lr: 0.01623, train_loss: 1.0884, val_loss: 1.2891, val_acc: 0.5197 Epoch [10], last_lr: 0.01434, train_loss: 1.0727, val_loss: 1.2515, val_acc: 0.5190 Epoch [11], last_lr: 0.01223, train_loss: 1.0491, val_loss: 1.2012, val_acc: 0.5618 Epoch [12], last_lr: 0.01000, train_loss: 1.0282, val_loss: 1.1476, val_acc: 0.5693 Epoch [13], last_lr: 0.00777, train_loss: 1.0028, val_loss: 1.3373, val_acc: 0.5252 Epoch [14], last_lr: 0.00566, train_loss: 0.9711, val_loss: 1.2637, val_acc: 0.5438 Epoch [15], last_lr: 0.00377, train_loss: 0.9451, val_loss: 1.1333, val_acc: 0.5791 Epoch [16], last_lr: 0.00218, train_loss: 0.8983, val_loss: 1.1267, val_acc: 0.5844 Epoch [17], last_lr: 0.00099, train_loss: 0.8665, val_loss: 1.1015, val_acc: 0.5916 Epoch [18], last_lr: 0.00025, train_loss: 0.8405, val_loss: 1.1262, val_acc: 0.5874 Epoch [19], last_lr: 0.00000, train_loss: 0.8338, val_loss: 1.1322, val_acc: 0.5856 Wall time: 7min 16s
In [116]:
epochs = 10
max_lr = .008
grad_clip = 0.08
weight_decay = 1e-5
opt_func = torch.optim.Adam
In [117]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, test_dl, 
                             grad_clip=grad_clip, 
                             weight_decay=weight_decay, 
                             opt_func=opt_func)
Epoch [0], last_lr: 0.00221, train_loss: 0.8340, val_loss: 1.1760, val_acc: 0.5732 Epoch [1], last_lr: 0.00607, train_loss: 0.8792, val_loss: 1.1964, val_acc: 0.5577 Epoch [2], last_lr: 0.00800, train_loss: 0.9465, val_loss: 1.1389, val_acc: 0.5782 Epoch [3], last_lr: 0.00760, train_loss: 0.9555, val_loss: 1.4225, val_acc: 0.4883 Epoch [4], last_lr: 0.00649, train_loss: 0.9494, val_loss: 1.1173, val_acc: 0.5745 Epoch [5], last_lr: 0.00489, train_loss: 0.9278, val_loss: 1.1563, val_acc: 0.5858 Epoch [6], last_lr: 0.00311, train_loss: 0.8915, val_loss: 1.1590, val_acc: 0.5668 Epoch [7], last_lr: 0.00151, train_loss: 0.8520, val_loss: 1.1269, val_acc: 0.5937 Epoch [8], last_lr: 0.00040, train_loss: 0.8151, val_loss: 1.1135, val_acc: 0.6028 Epoch [9], last_lr: 0.00000, train_loss: 0.7900, val_loss: 1.1407, val_acc: 0.5930 Wall time: 3min 36s
In [118]:
epochs = 10
max_lr = .005
grad_clip = 0.07
weight_decay = 1e-7
opt_func = torch.optim.Adam
In [119]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, test_dl, 
                             grad_clip=grad_clip, 
                             weight_decay=weight_decay, 
                             opt_func=opt_func)
Epoch [0], last_lr: 0.00138, train_loss: 0.7928, val_loss: 1.1434, val_acc: 0.5993 Epoch [1], last_lr: 0.00379, train_loss: 0.8155, val_loss: 1.2139, val_acc: 0.5561 Epoch [2], last_lr: 0.00500, train_loss: 0.8600, val_loss: 1.3799, val_acc: 0.5541 Epoch [3], last_lr: 0.00475, train_loss: 0.8670, val_loss: 1.3482, val_acc: 0.5326 Epoch [4], last_lr: 0.00406, train_loss: 0.8480, val_loss: 1.2053, val_acc: 0.5602 Epoch [5], last_lr: 0.00306, train_loss: 0.8276, val_loss: 1.1077, val_acc: 0.5962 Epoch [6], last_lr: 0.00194, train_loss: 0.7908, val_loss: 1.2448, val_acc: 0.5780 Epoch [7], last_lr: 0.00094, train_loss: 0.7659, val_loss: 1.1873, val_acc: 0.5812 Epoch [8], last_lr: 0.00025, train_loss: 0.7333, val_loss: 1.2136, val_acc: 0.5853 Epoch [9], last_lr: 0.00000, train_loss: 0.7221, val_loss: 1.2304, val_acc: 0.5815 Wall time: 3min 35s
In [120]:
train_time= '43:35'
In [121]:
torch.save(model.state_dict(), 'faces.pth')
In [122]:
jovian.reset()
jovian.log_hyperparams(arch='test 5: resnet9', 
                       epochs=epochs, 
                       lr=max_lr, 
                       scheduler='one-cycle', 
                       weight_decay=weight_decay,
                       grad_clip=grad_clip, 
                       opt=opt_func.__name__)
[jovian] Hyperparams logged.
In [123]:
jovian.log_metrics(val_loss=history[-1]['val_loss'], 
                   val_acc=history[-1]['val_acc'],
                   train_loss=history[-1]['train_loss'],
                   time=train_time)
[jovian] Metrics logged.
In [124]:
project_name='zero-to-gans-course-project2'
In [97]:
jovian.commit(project=project_name, filename= "zero-to-gans-course-project3", files=['faces_dataset'])
[jovian] Attempting to save notebook.. [jovian] Updating notebook "jolleyrancher2/zero-to-gans-course-project2" on https://jovian.ai/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Error: Failed to read Anaconda environment using command: "conda env export -n base --no-builds" [jovian] Uploading additional files... [jovian] Attaching records (metrics, hyperparameters, dataset etc.) [jovian] Committed successfully! https://jovian.ai/jolleyrancher2/zero-to-gans-course-project2

Testing the model against the test dataset to check the accuracy:

In [98]:
import random
In [99]:
def denormalize(images, mean, std):
    invTrans = tt.Compose([ tt.Normalize(mean=[ 0., 0., 0. ],
                                                     std = [ 1/std[0], 1/std[1], 1/std[2] ]),
                                tt.Normalize(mean = [-1*mean[0], -1*mean[2], -1*mean[2] ],
                                                     std = [ 1., 1., 1. ]),
                               ])
    return invTrans(images)
In [100]:
def predict_img(img, label):
    xb = to_device(img.unsqueeze(0), device)
    yb = model(xb)
    _, pred = torch.max(yb, dim=1)
    print(f'Label: {train_ds.classes[label]}, Predicted: {train_ds.classes[pred]}')
    img = denormalize(img, *stats)
    plt.imshow(img.permute(1, 2, 0))
In [101]:
predict_img(*test_ds[1])
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Angry, Predicted: Neutral
Notebook Image
In [102]:
predict_img(*test_ds[4000])
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Neutral, Predicted: Neutral
Notebook Image
In [103]:
predict_img(*test_ds[6000])
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Sad, Predicted: Sad
Notebook Image
In [104]:
predict_img(*test_ds[random.randint(6000,7000)])
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Sad, Predicted: Sad
Notebook Image
In [105]:
predict_img(*test_ds[random.randint(0,5000)])
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Neutral, Predicted: Sad
Notebook Image
In [106]:
predict_img(*test_ds[random.randint(5000,6000)])
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Label: Sad, Predicted: Fear
Notebook Image

jovian.commit(project=project_name,environment= None, )

In [107]:
def plot_accuracies(history):
    accuracies = [x['val_acc'] for x in history]
    plt.plot(accuracies, '-x')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.title('Accuracy vs. No. of epochs');
In [108]:
def plot_losses(history):
    train_losses = [x.get('train_loss') for x in history]
    val_losses = [x['val_loss'] for x in history]
    plt.plot(train_losses, '-bx')
    plt.plot(val_losses, '-rx')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['Training', 'Validation'])
    plt.title('Loss vs. No. of epochs');
In [109]:
def plot_lrs(history):
    lrs = np.concatenate([x.get('lrs', []) for x in history])
    plt.plot(lrs)
    plt.xlabel('Batch no.')
    plt.ylabel('Learning rate')
    plt.title('Learning Rate vs. Batch no.');

Graphing The training results:

In [110]:
plot_accuracies(history)
Notebook Image
In [111]:
plot_losses(history)
Notebook Image
In [112]:
plot_lrs(history)
Notebook Image
In [113]:
jovian.commit(project=project_name, files='facial-recognition-dataset', outputs=['faces.pth'])
[jovian] Attempting to save notebook.. [jovian] Error: Failed to detect notebook filename. Please provide the correct notebook filename as the "filename" argument to "jovian.commit".

Conclusion:

As with every assignment/project I've encountered from this course I struggled for a long time with understanding how all of the hyperparameters affected the model as well as understanding how the architecture worked. It was a great challenge to try to create my first working model from scratch and I feel like I could still spend another month with this and learn new things about it. I struggled with getting the model to train to an accuracy above 63% for most of the time I spent working on it. It was especially frustrating given the fact that I recently upgraded my desktop by building it from scratch and am still unable to purchase an RTX 30 series gpu for it.

Training for Facial expression recognition was much more difficult than I thought it would be, some expressions are fairly similar and it seems to create more error when trying to recognize certain expressions.

Although training the model each time took longer than I was hoping, I was able to gain a much deeper understanding of how to train a Deep Learning model, and the various ways it can be done depending on the data. Here are some things I feel I now have a firm grasp of after this assignment:

               1. Neural network architecture
               2. Optimizers and avoiding overfitting
               3. CNNs
               4. Convolutional layers
               5. How hyperparameters are used to train
               6. The usefulness of coding via Deep Learning compared to                           classical techniques

Here are some aspects I don't fully have a firm understanding of yet, and intend to continue to research on:

                1.Convoutional Neural Networks (I listed this above but I feel I only have a basic understanding of the principles and would like a better understanding of How they work and not just that it works.)
                2. Data augmentation
                3. Training a program on more than one type of data to make it more versatile for real world use.
                4. My goal is to understand the different aspects of Deep Learning well enough to create a model from scratch without needing to rely on my notes/ previous examples as much. 
                5. The math behind the relu function

I have a lot to learn still and based on my experience so far I believe most of what I need to work on will come from continuing to practice and develope my skills and understanding.