Jovian
⭐️
Sign In

image.png

Classifying Pokemon images using PyTorch

In thie project we will apply machine learning on Pokemon images. The goal is to identify pokemon name from images.

Data source1: Kaggle Pokemon Image Dataset Source zip file contains image of 149 first generation pokemon images.

We will seperate dataset into train and val dataset.

For testing dataset, we will use random pic from google image. In this project we will use a GPU for faster training

In [1]:
project_name = 'course-project-pokemon-classification'
In [2]:
import warnings
warnings.filterwarnings('ignore')
import os
import torch
import torchvision
import tarfile
import math
import torch.nn as nn
import numpy as np
import pandas as pd
import torch.nn.functional as F
import PIL
from numpy import genfromtxt
from torchvision.datasets.utils import download_url
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
import torchvision.transforms as tt
from torch.utils.data import random_split
from torchvision.utils import make_grid
import matplotlib
import matplotlib.pyplot as plt
import shutil
%matplotlib inline

matplotlib.rcParams['figure.facecolor'] = '#ffffff'

Downloading and Exploring the Data

We can use the opendatasets library to download the dataset from Kaggle.

In [3]:
!pip install opendatasets --upgrade --quiet
In [ ]:
from zipfile import ZipFile
In [ ]:
# Preparing the Pokemon Dataset

#Since the dataset is pretty large (2.24Gb after unzip), we will run this project locally.

#For online use
#Load Kaggle dataset from local and create PyTorch dataset to load the data.
#import opendatasets as od
#dataset_url = 'https://www.kaggle.com/thedagger/pokemon-generation-one'
#od.download(dataset_url)
# with ZipFile('./pokemon-generation-one/pokemon-generation-one.zip', 'r') as zipObj:
#    # Extract all the contents of zip file in current directory
#    zipObj.extractall()
#os.getcwd()
#image_dir = os.getcwd()
In [4]:
#For local use
# Define image data path
image_dir = "C:/Users/chung/Desktop/Python/online_course_GANs/Week 6/Course Project/dataset"
In [5]:
#Check image folder structure
import os
print(os.listdir(image_dir)[:10])
['Abra', 'Aerodactyl', 'Alakazam', 'Arbok', 'Arcanine', 'Articuno', 'Beedrill', 'Bellsprout', 'Blastoise', 'Bulbasaur']

We will set image size as 64 * 64 to speed up training time. If we define image size as 128 * 128, training time will take up to 8 hours.

In [6]:
image_size = 64
batch_size = 20
stats = (0.5, 0.5, 0.5), (0.5, 0.5, 0.5)

split-folders will be used to split dataset into training and validation dataset.
Training dataset will be 80% of source dataset and validation will be 20%.

In [ ]:
!pip install split-folders tqdm
val_ratio = 0.2
In [ ]:
import splitfolders  # or import split_folders

# # Split with a ratio.
# # To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
splitfolders.ratio(image_dir, output="C:/Users/chung/Desktop/Python/online_course_GANs/Week 6/Course Project/dataset/train_valid/", seed=1337, ratio=(1-val_ratio, val_ratio), group_prefix=None) # default values

In [10]:
#Define training and validation dataset local path
image_train_dir = "C:/Users/chung/Desktop/Python/online_course_GANs/Week 6/Course Project/dataset/train_valid/train/"
image_val_dir = "C:/Users/chung/Desktop/Python/online_course_GANs/Week 6/Course Project/dataset/train_valid/val/"

Data normalization and augmentation

Since we have unbalanced dataset, we will apply random data augmentation. At first we set the image size to be 8080, after some random augmentation, we do center cropping to get final image with size 6464. Augmentation include flipping, translating, scaling and shearing.

Augmentation will only apply to training set, for validation set we will only apply resize (64*64) and normalization.

In [11]:
# Data transforms (normalization & data augmentation)

train_tfms = tt.Compose([  #Image transformation will be applied with probability 20%
    tt.Resize((80,80)),    #Resize image to 80*80
    tt.RandomHorizontalFlip(),     #Apply random horizontal flip
    tt.RandomApply([tt.RandomAffine(degrees=20),    #Apply random rotation with degree from -20 to +20
    tt.RandomAffine(degrees=0, translate=(0.2, 0.2), fillcolor=(255, 255, 255)),  #Apply random horizontal shift
    tt.RandomAffine(degrees=0, scale=(0.7, 0.7)),  #Apply random scale
    tt.RandomAffine(degrees=0, shear=(0, 0, 0, 10))], p=0.2),    #Apply random image shear
    
    tt.CenterCrop((image_size, image_size)),  #Corp image size to 64*64
    tt.ToTensor(),  #Transform image to tensor data
    tt.Normalize(*stats)])  #Normalize tensor data

#For validation ddataset, only normalization and resize will be applied
valid_tfms = tt.Compose([tt.ToTensor(),tt.Resize((image_size,image_size)), tt.Normalize(*stats)])

Create dataset and dataloader

In [12]:
# PyTorch datasets
train_ds = ImageFolder(image_train_dir, train_tfms)
valid_ds = ImageFolder(image_val_dir, valid_tfms)
train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=8, pin_memory=True)
valid_dl = DataLoader(valid_ds, batch_size, num_workers=8, pin_memory=True)

Define denormalization function to denormalize image for display.

In [13]:
def denorm(img_tensors):
    return img_tensors * stats[1][0] + stats[0][0]

Define show_images function to display denormalized images and show_batch function to display all images in one single batch.

In [14]:
def show_images(images, nmax=64):
    fig, ax = plt.subplots(figsize=(8, 8))
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(make_grid(denorm(images.detach()[:nmax]), nrow=8).permute(1, 2, 0))

def show_batch(dl, nmax=64):
    for images, _ in dl:
        show_images(images, nmax)
        break
In [15]:
show_batch(train_dl)
In [16]:
#Let's check the dataset for each classes

import seaborn as sns
classes = os.listdir(image_train_dir) # List of all classes
print(f'Total number of categories: {len(classes)}')


counts = {}
for c in classes:
    counts[c] = len(os.listdir(os.path.join(image_train_dir, c)))
print(f'Total number of images in training dataset: {sum(list(counts.values()))}')
# Number of images in each clsss plot
fig = plt.figure(figsize = (25, 5))
sns.lineplot(x = list(counts.keys()), y = list(counts.values())).set_title('Number of images in each class')
plt.xticks(rotation = 90)
plt.margins(x=0)
plt.show()
Total number of categories: 149 Total number of images in training dataset: 8509

We have an unbalanced dataset, only few pokemons have more than 200 images, most of them contain around 50 images. Limited dataset will affect our model accuracy, especially pokemon images are not standardized, some of them are from anime, pokedex, or even hand drawn pictures. We need to apply augmentation to minimize such limiation, or only predicting pokemon with more than 200 images in dataset.

Using GPU processing

In [17]:
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
    
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

Check the current processor:

In [18]:
device = get_default_device()
device
Out[18]:
device(type='cuda')

Pass data to GPU for further processing

In [19]:
train_dl = DeviceDataLoader(train_dl, device)
valid_dl = DeviceDataLoader(valid_dl, device)

Model with Residual Blocks and Batch Normalization

One of the key changes to our CNN model this time is the addition of the resudial block, which adds the original input back to the output feature map obtained by passing the input through one or more convolutional layers.

Here is a very simple Residual block:

In [20]:
class SimpleResidualBlock(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()
        
    def forward(self, x):
        out = self.conv1(x)
        out = self.relu1(out)
        out = self.conv2(out)
        return self.relu2(out) + x 
In [21]:
simple_resnet = to_device(SimpleResidualBlock(), device)

for images, labels in train_dl:
    out = simple_resnet(images)
    print(out.shape)
    break
    
del simple_resnet, images, labels
torch.cuda.empty_cache()
torch.Size([20, 3, 64, 64])
In [22]:
def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))

class ImageClassificationBase(nn.Module):
    def training_step(self, batch):
        images, labels = batch 
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels) # Calculate loss
        return loss
    
    def validation_step(self, batch):
        images, labels = batch 
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
    
    def epoch_end(self, epoch, result):
        print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
            epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))

We will use the ResNet9 architecture plus extra layers to reduce size from 512x2x2 to 512x1x1, which can finally feed into last layer with input size 512 x 1 x 1

In [23]:
def conv_block(in_channels, out_channels, pool=False):
    layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), 
              nn.BatchNorm2d(out_channels), 
              nn.ReLU(inplace=True)]
    if pool: layers.append(nn.MaxPool2d(2))
    return nn.Sequential(*layers)

class ResNet9(ImageClassificationBase):
    def __init__(self, in_channels, num_classes):
        super().__init__()
        #Start with 3 x 64 x 64
        self.conv1 = conv_block(in_channels, 64) #64 x 64 x 64
        self.conv2 = conv_block(64, 128, pool=True) #128 x 32 x 32
        self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128)) #128 x 32 x 32
        
        self.conv3 = conv_block(128, 256, pool=True) #256 x 16 x 16
        self.conv4 = conv_block(256, 512, pool=True) #512 x 8 x 8
        self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512)) #512 x 16x 16
        self.conv5 = conv_block(512, 512, pool=True) #512 x 4 x 4
        self.conv6 = conv_block(512, 512, pool=True) #512 x 2 x 2
        self.classifier = nn.Sequential(nn.MaxPool2d(2), #512 x 1 x 1
                                        nn.Flatten(), 
                                        nn.Dropout(0.2),
                                    nn.Linear(512, num_classes))


    def forward(self, xb):
        #print('1', xb.shape)
        out = self.conv1(xb) 
        #print('2',out.shape)
        out = self.conv2(out)
        #print('3',out.shape)
        out = self.res1(out) + out 
        #print('4',out.shape)
        out = self.conv3(out) 
        #print('5',out.shape)
        out = self.conv4(out) 
        #print('6',out.shape)
        out = self.res2(out) + out 
        #print('7',out.shape)
        #out = self.maxpool2d(out)
        #print('8',out.shape)
        out = self.conv5(out)
        out = self.conv6(out)
        out = self.classifier(out) 
        #print('final',out.shape)
        return out

In [24]:
model = to_device(ResNet9(3, 149), device)
model
Out[24]:
ResNet9(
  (conv1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (conv2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (res1): Sequential(
    (0): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (res2): Sequential(
    (0): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv5): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv6): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (1): Flatten(start_dim=1, end_dim=-1)
    (2): Dropout(p=0.2, inplace=False)
    (3): Linear(in_features=512, out_features=149, bias=True)
  )
)

Training the model

We will apply some technique while training the model, which will improve our performance in different ways:

  • Learning rate scheduling: Instead of using a fixed learning rate, we will use a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, and the one we'll use is called the "One Cycle Learning Rate Policy", which involves starting with a low learning rate, gradually increasing it batch-by-batch to a high learning rate for about 30% of epochs, then gradually decreasing it to a very low value for the remaining epochs. Reference: https://sgugger.github.io/the-1cycle-policy.html

  • Weight decay: We also use weight decay, which is yet another regularization technique which prevents the weights from becoming too large by adding an additional term to the loss function. Reference: https://towardsdatascience.com/this-thing-called-weight-decay-a7cd4bcfccab

  • Gradient clipping: Apart from the layer weights and outputs, it also helpful to limit the values of gradients to a small range to prevent undesirable changes in parameters due to large gradient values. This simple yet effective technique is called gradient clipping. Reference: https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48

Let's define a fit_one_cycle function to incorporate these changes. We'll also record the learning rate used for each batch.

In [25]:
@torch.no_grad()
def evaluate(model, val_loader):
    model.eval()
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader, 
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    torch.cuda.empty_cache()
    history = []
    
    # Set up cutom optimizer with weight decay
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    # Set up one-cycle learning rate scheduler
    sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs, 
                                                steps_per_epoch=len(train_loader))
    
    for epoch in range(epochs):
        # Training Phase 
        model.train()
        train_losses = []
        lrs = []
        for batch in train_loader:
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            
            # Gradient clipping
            if grad_clip: 
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            
            optimizer.step()
            optimizer.zero_grad()
            
            # Record & update learning rate
            lrs.append(get_lr(optimizer))
            sched.step()
        
        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        history.append(result)
    return history

First validation set prediction with our untrained model:

In [26]:
history = [evaluate(model, valid_dl)]
history
Out[26]:
[{'val_loss': 5.003937244415283, 'val_acc': 0.007727272808551788}]

This result should be close to 1/149 ~ 0.0067 since this is a blind guess.

In [27]:
epochs = 50
max_lr = 0.01
grad_clip = 0.1
weight_decay = 1e-4
opt_func = torch.optim.Adam
In [28]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, valid_dl, 
                             grad_clip=grad_clip, 
                             weight_decay=weight_decay, 
                             opt_func=opt_func)
Epoch [0], last_lr: 0.00050, train_loss: 4.6454, val_loss: 4.4162, val_acc: 0.1200 Epoch [1], last_lr: 0.00081, train_loss: 4.1097, val_loss: 3.9498, val_acc: 0.1382 Epoch [2], last_lr: 0.00132, train_loss: 3.8486, val_loss: 3.6594, val_acc: 0.1859 Epoch [3], last_lr: 0.00199, train_loss: 3.6423, val_loss: 3.5208, val_acc: 0.2073 Epoch [4], last_lr: 0.00280, train_loss: 3.4738, val_loss: 3.3324, val_acc: 0.2300 Epoch [5], last_lr: 0.00372, train_loss: 3.2608, val_loss: 3.5463, val_acc: 0.2150 Epoch [6], last_lr: 0.00470, train_loss: 3.0953, val_loss: 3.3136, val_acc: 0.2624 Epoch [7], last_lr: 0.00570, train_loss: 3.0169, val_loss: 3.2859, val_acc: 0.2472 Epoch [8], last_lr: 0.00668, train_loss: 3.0006, val_loss: 3.3475, val_acc: 0.2561 Epoch [9], last_lr: 0.00760, train_loss: 3.0545, val_loss: 3.2989, val_acc: 0.2532 Epoch [10], last_lr: 0.00841, train_loss: 3.0284, val_loss: 3.6048, val_acc: 0.2164 Epoch [11], last_lr: 0.00908, train_loss: 2.9907, val_loss: 3.0178, val_acc: 0.2968 Epoch [12], last_lr: 0.00958, train_loss: 2.9426, val_loss: 2.9497, val_acc: 0.3118 Epoch [13], last_lr: 0.00990, train_loss: 2.8562, val_loss: 3.1543, val_acc: 0.2767 Epoch [14], last_lr: 0.01000, train_loss: 2.7747, val_loss: 2.9062, val_acc: 0.3195 Epoch [15], last_lr: 0.00998, train_loss: 2.6699, val_loss: 3.0589, val_acc: 0.3052 Epoch [16], last_lr: 0.00992, train_loss: 2.5964, val_loss: 2.8883, val_acc: 0.3385 Epoch [17], last_lr: 0.00982, train_loss: 2.4981, val_loss: 2.7404, val_acc: 0.3631 Epoch [18], last_lr: 0.00968, train_loss: 2.4570, val_loss: 3.1855, val_acc: 0.3199 Epoch [19], last_lr: 0.00950, train_loss: 2.3965, val_loss: 2.8563, val_acc: 0.3681 Epoch [20], last_lr: 0.00929, train_loss: 2.3425, val_loss: 2.7746, val_acc: 0.3644 Epoch [21], last_lr: 0.00905, train_loss: 2.2683, val_loss: 2.5605, val_acc: 0.4171 Epoch [22], last_lr: 0.00877, train_loss: 2.1778, val_loss: 2.7757, val_acc: 0.3869 Epoch [23], last_lr: 0.00846, train_loss: 2.1624, val_loss: 2.5622, val_acc: 0.4047 Epoch [24], last_lr: 0.00812, train_loss: 2.0806, val_loss: 2.6575, val_acc: 0.4189 Epoch [25], last_lr: 0.00775, train_loss: 2.0270, val_loss: 2.4752, val_acc: 0.4301 Epoch [26], last_lr: 0.00737, train_loss: 1.9572, val_loss: 2.4186, val_acc: 0.4385 Epoch [27], last_lr: 0.00697, train_loss: 1.8925, val_loss: 2.5046, val_acc: 0.4149 Epoch [28], last_lr: 0.00655, train_loss: 1.7993, val_loss: 2.3468, val_acc: 0.4590 Epoch [29], last_lr: 0.00611, train_loss: 1.7254, val_loss: 2.3305, val_acc: 0.4600 Epoch [30], last_lr: 0.00567, train_loss: 1.6568, val_loss: 2.4110, val_acc: 0.4560 Epoch [31], last_lr: 0.00522, train_loss: 1.5753, val_loss: 2.3461, val_acc: 0.4671 Epoch [32], last_lr: 0.00478, train_loss: 1.5043, val_loss: 2.2680, val_acc: 0.4778 Epoch [33], last_lr: 0.00433, train_loss: 1.4303, val_loss: 2.2129, val_acc: 0.4766 Epoch [34], last_lr: 0.00389, train_loss: 1.3581, val_loss: 2.2384, val_acc: 0.4947 Epoch [35], last_lr: 0.00345, train_loss: 1.2877, val_loss: 2.1686, val_acc: 0.5064 Epoch [36], last_lr: 0.00303, train_loss: 1.2002, val_loss: 2.2315, val_acc: 0.5148 Epoch [37], last_lr: 0.00263, train_loss: 1.1500, val_loss: 2.1495, val_acc: 0.5155 Epoch [38], last_lr: 0.00225, train_loss: 1.0393, val_loss: 2.1593, val_acc: 0.5166 Epoch [39], last_lr: 0.00188, train_loss: 0.9982, val_loss: 2.1274, val_acc: 0.5209 Epoch [40], last_lr: 0.00154, train_loss: 0.9571, val_loss: 2.1003, val_acc: 0.5291 Epoch [41], last_lr: 0.00123, train_loss: 0.8673, val_loss: 2.0685, val_acc: 0.5411 Epoch [42], last_lr: 0.00095, train_loss: 0.8226, val_loss: 2.0582, val_acc: 0.5373 Epoch [43], last_lr: 0.00071, train_loss: 0.7756, val_loss: 2.0239, val_acc: 0.5543 Epoch [44], last_lr: 0.00050, train_loss: 0.7480, val_loss: 2.0302, val_acc: 0.5466 Epoch [45], last_lr: 0.00032, train_loss: 0.7132, val_loss: 2.0764, val_acc: 0.5459 Epoch [46], last_lr: 0.00018, train_loss: 0.6888, val_loss: 2.0546, val_acc: 0.5482 Epoch [47], last_lr: 0.00008, train_loss: 0.6996, val_loss: 2.0412, val_acc: 0.5506 Epoch [48], last_lr: 0.00002, train_loss: 0.6863, val_loss: 2.0327, val_acc: 0.5468 Epoch [49], last_lr: 0.00000, train_loss: 0.6644, val_loss: 2.0215, val_acc: 0.5468 Wall time: 50min 50s
In [29]:
train_time = '50min 50s'
In [30]:
def plot_accuracies(history):
    accuracies = [x['val_acc'] for x in history]
    plt.plot(accuracies, '-x')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.title('Accuracy vs. No. of epochs');
In [31]:
plot_accuracies(history)
In [32]:
def plot_losses(history):
    train_losses = [x.get('train_loss') for x in history]
    val_losses = [x['val_loss'] for x in history]
    plt.plot(train_losses, '-bx')
    plt.plot(val_losses, '-rx')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['Training', 'Validation'])
    plt.title('Loss vs. No. of epochs');
In [33]:
plot_losses(history)
In [34]:
def plot_lrs(history):
    lrs = np.concatenate([x.get('lrs', []) for x in history])
    plt.plot(lrs)
    plt.xlabel('Batch no.')
    plt.ylabel('Learning rate')
    plt.title('Learning Rate vs. Batch no.');
In [35]:
plot_lrs(history)

Let's try to predict some online pictures.

In [ ]:
def predict_image(img, model):
    # Convert to a batch of 1
    xb = to_device(img.unsqueeze(0), device)
    # Get predictions from model
    yb = model(xb)
    # Pick index with highest probability
    _, preds  = torch.max(yb, dim=1)
    # Retrieve the class label
    return train_ds.classes[preds[0].item()]

We will use simple_image_download library to scrape google image online.

In [ ]:
!pip install simple_image_download
In [ ]:
#model.load_state_dict(torch.load("C:/Users/chung/Desktop/Python/online_course_GANs/Week 6/Course Project/pokemon-resnet9+2.pth"))

Let's try with 5 pokemons, each of them download 20 images from google.

In [ ]:
root_dir = "C:/Users/chung/Desktop/Python/online_course_GANs/Week 6/Course Project/"
In [ ]:
pokemon_test = ['Pikachu', 'Snorlax', 'Eevee', 'Kakuna', 'Squirtle']
In [ ]:
from simple_image_download import simple_image_download as simp

shutil.rmtree(root_dir + "test/", ignore_errors=True) #Clear all previous testing images
for pokemon in pokemon_test:
    response = simp.simple_image_download
    response().download(pokemon, 20)  #download 20 images of pokemon
    
    original = root_dir + "simple_images/" + pokemon
    target = root_dir + "test/" + pokemon
    
    shutil.rmtree(root_dir + "test/", ignore_errors=True) #Clear all previous testing images
    
    if not os.path.exists(root_dir + "test/" + pokemon):
        os.makedirs(root_dir + "test/" + pokemon)   
    file_names = os.listdir(original)
    
    for file_name in file_names:
        shutil.move(os.path.join(original, file_name), target)
In [ ]:
image_test_dir = root_dir + "test"
test_ds = ImageFolder(image_test_dir, valid_tfms)
test_dl = DataLoader(test_ds, batch_size, shuffle=True, num_workers=8, pin_memory=True)
In [ ]:
show_batch(test_dl)
In [ ]:
test_dl = DeviceDataLoader(test_dl, device)
In [ ]:
evaluate(model, test_dl)

Accuracy on testing dataset is very low, let's try to change the testing pokemon to all 149 pokemon. Also we are adding 'pokemon' in search term to improve quality of download images.

In [ ]:
pokemon_test = []
for pokemon in range(len(train_ds.classes)):
    pokemon_test.append(train_ds.classes[pokemon] + '_pokemon')
print(pokemon_test)
In [ ]:
shutil.rmtree(root_dir + "test/", ignore_errors=True) #Clear all previous testing images
shutil.rmtree(root_dir + "simple_images/", ignore_errors=True)

for pokemon in pokemon_test:
    response = simp.simple_image_download
    response().download(pokemon, 20)  #download 20 images of pokemon
    
    original = root_dir + "simple_images/" + pokemon
    target = root_dir + "test/" + pokemon[:-8]
    

    
    if not os.path.exists(root_dir + "test/" + pokemon):
        os.makedirs(root_dir + "test/" + pokemon[:-8])   
    file_names = os.listdir(original)
    
    for file_name in file_names:
        shutil.move(os.path.join(original, file_name), target)

In [ ]:
image_test_dir = root_dir + "test/"
test_ds = ImageFolder(image_test_dir, valid_tfms)
test_dl = DataLoader(test_ds, batch_size*2, shuffle=True, num_workers=8, pin_memory=True)
In [ ]:
show_batch(test_dl)
In [ ]:
test_dl = DeviceDataLoader(test_dl, device)
In [ ]:
evaluate(model, test_dl)

Our model now have over 70% accuracy on testing data. However, the actual accuracy is expected to smaller than 70%, since our traning data set may contain images in testing dataset.

Let's predict and show some individual image.

In [ ]:
def predict_image(img, model):
    # Convert to a batch of 1
    xb = to_device(img.unsqueeze(0), device)
    # Get predictions from model
    yb = model(xb)
    # Pick index with highest probability
    _, preds  = torch.max(yb, dim=1)
    # Retrieve the class label
    return train_ds.classes[preds[0].item()]
In [ ]:
img, label = test_ds[0]
plt.imshow(img.permute(1, 2, 0).clamp(0, 1))
print('Label:', test_ds.classes[label], ', Predicted:', predict_image(img, model))
In [ ]:
img, label = test_ds[100]
plt.imshow(img.permute(1, 2, 0).clamp(0, 1))
print('Label:', test_ds.classes[label], ', Predicted:', predict_image(img, model))
In [ ]:
img, label = test_ds[200]
plt.imshow(img.permute(1, 2, 0).clamp(0, 1))
print('Label:', test_ds.classes[label], ', Predicted:', predict_image(img, model))
In [ ]:
img, label = test_ds[300]
plt.imshow(img.permute(1, 2, 0).clamp(0, 1))
print('Label:', test_ds.classes[label], ', Predicted:', predict_image(img, model))
In [ ]:
img, label = test_ds[400]
plt.imshow(img.permute(1, 2, 0).clamp(0, 1))
print('Label:', test_ds.classes[label], ', Predicted:', predict_image(img, model))
In [ ]:
!pip install jovian --upgrade -q
In [ ]:
#import jovian
In [ ]:
#jovian.commit(project=project_name)
In [ ]:
#torch.save(model.state_dict(), 'pokemon-resnet9+2.pth')
In [ ]:
# jovian.reset()
# jovian.log_hyperparams(arch='resnet9+2', 
#                        epochs=epochs, 
#                        lr=max_lr, 
#                        scheduler='one-cycle', 
#                        weight_decay=weight_decay, 
#                        grad_clip=grad_clip,
#                        opt=opt_func.__name__)
In [ ]:
# jovian.log_metrics(val_loss=history[-1]['val_loss'], 
#                    val_acc=history[-1]['val_acc'],
#                    train_loss=history[-1]['train_loss'],
#                    time=train_time)

Summary

Although our data set is not perfect, we can still achieve ~67% accuracy for validation set data. One of the limitation is that most pokemons has only around 50 images, with different style and poses. We do not have enough data to learn and draw relationship by machine learning. Another constrain is the complexity. Images we have are in different size, from 1200630 to 206206.

In order to capture as much detail as we can, we need deeper layer. If we increase input image size and add more layers, we should see improvement in accuracy, however it will take much more running time. Also we can apply transfer learning using pretrained model to improve training performance.

Improvement (on-going)

Trail 1: Image size: 128x128, Batch size: 50, epoch: 100, training time: 1.5 hours, validation accuracy: 67%
Trail 2: Image size: 128x128, Batch size: 100, epoch: 200, training time: 8 hours, validation accuracy: 69%
Trail 3: Image size: 64x64, Batch size: 50, epoch: 50, training time: 50min 50s, validation accuracy: 55%
Trail 4: Image size: 64x64, Batch size: 50, epoch: 50, training time: 50min 59s, validation accuracy: 57%, with over-sampling

  1. (Done) Increase batch size from 50 to 100 and epoch from 100 to 200. Accuracy is improved to almost 70%, however, it took 8 hours to train.
  2. (Done, +2% accuraccy) Oversampling for classes with inadequate images for training.
  3. Increase training dataset so it is not skewed to 5 pokemons with largest dataset. We can download image to expand our training set. However, we need to identify and remove duplicate images within training+validation dataset and ensure all testing images are unique.
  4. Transfer training

** Thanks Jovian for providing such a great course! **

IMPROVEMENT SECTION

Over-sampling

We will use imbalanced dataset sampler to oversample our dataloader
source: https://github.com/ufoym/imbalanced-dataset-sampler

We will add more examples from minority classes to balance our dataset. image-2.png

In [37]:
from torchsampler import ImbalancedDatasetSampler
In [ ]:
# PyTorch datasets
train_dl_oversampled = DataLoader(train_ds, batch_size, shuffle=False, num_workers=8, pin_memory=True, sampler=ImbalancedDatasetSampler(train_ds))

Let's display 1000 images in once and see and there are any over-sampling (duplicate images in single batch) happens.

In [ ]:
def show_images(images, nmax=1000):
    fig, ax = plt.subplots(figsize=(20, 20))
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(make_grid(denorm(images.detach()[:nmax]), nrow=30).permute(1, 2, 0))

def show_batch(dl, nmax=1000):
    for images, _ in dl:
        show_images(images, nmax)
        break
In [ ]:
batch_size = 1000
In [ ]:
train_dl_oversampled = DataLoader(train_ds, batch_size, shuffle=False, num_workers=8, pin_memory=True, sampler=ImbalancedDatasetSampler(train_ds))
In [ ]:
#show_batch(train_dl_oversampled)

image.png

This Clefairy we only have one image in our dataset, and it is re-sampled into our training set after using ImbalancedDatasetSampler. a09c0f5bc77c483aa362fed832c60848.jpg

Try our new dataset with our model:

In [ ]:
def show_images(images, nmax=64):
    fig, ax = plt.subplots(figsize=(8, 8))
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(make_grid(denorm(images.detach()[:nmax]), nrow=30).permute(1, 2, 0))

def show_batch(dl, nmax=64):
    for images, _ in dl:
        show_images(images, nmax)
        break
In [38]:
train_dl_oversampled = DataLoader(train_ds, batch_size, shuffle=False, num_workers=8, pin_memory=True, sampler=ImbalancedDatasetSampler(train_ds))
valid_dl_oversampled = DataLoader(valid_ds, batch_size, num_workers=8, pin_memory=True, sampler=ImbalancedDatasetSampler(valid_ds))
train_dl_oversampled = DeviceDataLoader(train_dl_oversampled, device)
valid_dl_oversampled = DeviceDataLoader(train_dl_oversampled, device)
In [39]:
model = to_device(ResNet9(3, 149), device)
model

Out[39]:
ResNet9(
  (conv1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (conv2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (res1): Sequential(
    (0): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (res2): Sequential(
    (0): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv5): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv6): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (1): Flatten(start_dim=1, end_dim=-1)
    (2): Dropout(p=0.2, inplace=False)
    (3): Linear(in_features=512, out_features=149, bias=True)
  )
)
In [40]:
history = [evaluate(model, valid_dl)]
history
Out[40]:
[{'val_loss': 5.004033088684082, 'val_acc': 0.0059090908616781235}]
In [41]:
epochs = 50
max_lr = 0.01
grad_clip = 0.1
weight_decay = 1e-4
opt_func = torch.optim.Adam
In [42]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, valid_dl, 
                             grad_clip=grad_clip, 
                             weight_decay=weight_decay, 
                             opt_func=opt_func)
Epoch [0], last_lr: 0.00050, train_loss: 4.6322, val_loss: 4.2092, val_acc: 0.1127 Epoch [1], last_lr: 0.00081, train_loss: 4.0271, val_loss: 3.9550, val_acc: 0.1514 Epoch [2], last_lr: 0.00132, train_loss: 3.7686, val_loss: 3.8311, val_acc: 0.1791 Epoch [3], last_lr: 0.00199, train_loss: 3.5924, val_loss: 3.5445, val_acc: 0.1886 Epoch [4], last_lr: 0.00280, train_loss: 3.4033, val_loss: 3.5578, val_acc: 0.2064 Epoch [5], last_lr: 0.00372, train_loss: 3.1657, val_loss: 3.2461, val_acc: 0.2555 Epoch [6], last_lr: 0.00470, train_loss: 3.0780, val_loss: 3.4868, val_acc: 0.2191 Epoch [7], last_lr: 0.00570, train_loss: 2.9870, val_loss: 3.1586, val_acc: 0.2691 Epoch [8], last_lr: 0.00668, train_loss: 2.9757, val_loss: 3.4881, val_acc: 0.2150 Epoch [9], last_lr: 0.00760, train_loss: 2.9928, val_loss: 3.1472, val_acc: 0.2472 Epoch [10], last_lr: 0.00841, train_loss: 2.9762, val_loss: 3.3644, val_acc: 0.2527 Epoch [11], last_lr: 0.00908, train_loss: 2.9433, val_loss: 3.3566, val_acc: 0.2614 Epoch [12], last_lr: 0.00958, train_loss: 2.8766, val_loss: 3.2949, val_acc: 0.2450 Epoch [13], last_lr: 0.00990, train_loss: 2.8142, val_loss: 3.0019, val_acc: 0.3027 Epoch [14], last_lr: 0.01000, train_loss: 2.7324, val_loss: 3.2043, val_acc: 0.3000 Epoch [15], last_lr: 0.00998, train_loss: 2.6588, val_loss: 2.6588, val_acc: 0.3647 Epoch [16], last_lr: 0.00992, train_loss: 2.5787, val_loss: 2.8336, val_acc: 0.3520 Epoch [17], last_lr: 0.00982, train_loss: 2.4843, val_loss: 2.8632, val_acc: 0.3508 Epoch [18], last_lr: 0.00968, train_loss: 2.4528, val_loss: 2.7994, val_acc: 0.3531 Epoch [19], last_lr: 0.00950, train_loss: 2.4114, val_loss: 2.7515, val_acc: 0.3780 Epoch [20], last_lr: 0.00929, train_loss: 2.3242, val_loss: 2.5955, val_acc: 0.4101 Epoch [21], last_lr: 0.00905, train_loss: 2.2661, val_loss: 2.7001, val_acc: 0.3803 Epoch [22], last_lr: 0.00877, train_loss: 2.1891, val_loss: 2.5485, val_acc: 0.4144 Epoch [23], last_lr: 0.00846, train_loss: 2.1388, val_loss: 2.5032, val_acc: 0.4237 Epoch [24], last_lr: 0.00812, train_loss: 2.0551, val_loss: 2.3839, val_acc: 0.4456 Epoch [25], last_lr: 0.00775, train_loss: 1.9850, val_loss: 2.6571, val_acc: 0.4237 Epoch [26], last_lr: 0.00737, train_loss: 1.9260, val_loss: 2.4246, val_acc: 0.4475 Epoch [27], last_lr: 0.00697, train_loss: 1.8476, val_loss: 2.3917, val_acc: 0.4512 Epoch [28], last_lr: 0.00655, train_loss: 1.7959, val_loss: 2.3282, val_acc: 0.4596 Epoch [29], last_lr: 0.00611, train_loss: 1.7045, val_loss: 2.3087, val_acc: 0.4625 Epoch [30], last_lr: 0.00567, train_loss: 1.6380, val_loss: 3.0409, val_acc: 0.3862 Epoch [31], last_lr: 0.00522, train_loss: 1.5591, val_loss: 2.2317, val_acc: 0.4970 Epoch [32], last_lr: 0.00478, train_loss: 1.4892, val_loss: 2.2943, val_acc: 0.4693 Epoch [33], last_lr: 0.00433, train_loss: 1.4169, val_loss: 2.2054, val_acc: 0.4925 Epoch [34], last_lr: 0.00389, train_loss: 1.3512, val_loss: 2.1611, val_acc: 0.5058 Epoch [35], last_lr: 0.00345, train_loss: 1.2796, val_loss: 2.0885, val_acc: 0.5208 Epoch [36], last_lr: 0.00303, train_loss: 1.1591, val_loss: 2.0698, val_acc: 0.5382 Epoch [37], last_lr: 0.00263, train_loss: 1.0917, val_loss: 2.0857, val_acc: 0.5255 Epoch [38], last_lr: 0.00225, train_loss: 1.0060, val_loss: 2.1528, val_acc: 0.5213 Epoch [39], last_lr: 0.00188, train_loss: 0.9618, val_loss: 2.0320, val_acc: 0.5471 Epoch [40], last_lr: 0.00154, train_loss: 0.8988, val_loss: 1.9761, val_acc: 0.5507 Epoch [41], last_lr: 0.00123, train_loss: 0.8123, val_loss: 2.0242, val_acc: 0.5512 Epoch [42], last_lr: 0.00095, train_loss: 0.7818, val_loss: 2.0221, val_acc: 0.5572 Epoch [43], last_lr: 0.00071, train_loss: 0.7413, val_loss: 1.9841, val_acc: 0.5586 Epoch [44], last_lr: 0.00050, train_loss: 0.6953, val_loss: 1.9591, val_acc: 0.5647 Epoch [45], last_lr: 0.00032, train_loss: 0.6674, val_loss: 1.9831, val_acc: 0.5632 Epoch [46], last_lr: 0.00018, train_loss: 0.6619, val_loss: 1.9560, val_acc: 0.5736 Epoch [47], last_lr: 0.00008, train_loss: 0.6240, val_loss: 1.9314, val_acc: 0.5704 Epoch [48], last_lr: 0.00002, train_loss: 0.6399, val_loss: 1.9296, val_acc: 0.5718 Epoch [49], last_lr: 0.00000, train_loss: 0.6240, val_loss: 1.9486, val_acc: 0.5688 Wall time: 50min 59s

There is 2% improvement if we apply over-sampling to training and validation dataset.

In [ ]:
jovian.commit(project=project_name)
[jovian] Attempting to save notebook..
In [ ]: