This tutorial series is a hands-on beginner-friendly introduction to deep learning using PyTorch, an open-source neural networks library. These tutorials take a practical and coding-focused approach. The best way to learn the material is to execute the code and experiment with it yourself. Check out the full series here:
This tutorial is an executable Jupyter notebook hosted on Jovian. You can run this tutorial and experiment with the code examples in a couple of ways: using free online resources (recommended) or on your computer.
The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Colab. Google Colab is a free online platform for running Jupyter notebooks using Google's cloud infrastructure. You can also select "Run on Binder" or "Run on Kaggle" if you face issues running the notebook on Google Colab.
To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.
You can use a Graphics Processing Unit (GPU) to train your models faster if your execution platform is connected to a GPU manufactured by NVIDIA. Follow these instructions to use a GPU on the platform of your choice:
If you do not have access to a GPU or aren't sure what it is, don't worry, you can execute all the code in this tutorial just fine without a GPU.
Deep neural networks are used mainly for supervised learning: classification or regression. Generative Adversarial Networks or GANs, however, use neural networks for a very different purpose: Generative modeling
Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset. - Source
To get a sense of the power of generative models, just visit thispersondoesnotexist.com. Every time you reload the page, a new image of a person's face is generated on the fly. The results are pretty fascinating:
While there are many approaches used for generative modeling, a Generative Adversarial Network takes the following approach:
There are two neural networks: a Generator and a Discriminator. The generator generates a "fake" sample given a random vector/matrix, and the discriminator attempts to detect whether a given sample is "real" (picked from the training data) or "fake" (generated by the generator). Training happens in tandem: we train the discriminator for a few epochs, then train the generator for a few epochs, and repeat. This way both the generator and the discriminator get better at doing their jobs.
GANs however, can be notoriously difficult to train, and are extremely sensitive to hyperparameters, activation functions and regularization. In this tutorial, we'll train a GAN to generate images of anime characters' faces.
We'll use the Anime Face Dataset, which consists of over 63,000 cropped anime faces. Note that generative modeling is an unsupervised learning task, so the images do not have any labels. Most of the code in this tutorial is based on this notebook.
project_name = '06b-anime-dcgan'
# Uncomment and run the appropriate command for your operating system, if required # No installation is reqiured on Google Colab / Kaggle notebooks # Linux / Binder / Windows (No GPU) # !pip install numpy matplotlib torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html # Linux / Windows (GPU) # pip install numpy matplotlib torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html # MacOS (NO GPU) # !pip install numpy matplotlib torch torchvision torchaudio
We can use the
opendatasets library to download the dataset from Kaggle.
opendatasets uses the Kaggle Official API for downloading datasets from Kaggle. Follow these steps to find your API credentials:
Sign in to https://kaggle.com/, then click on your profile picture on the top right and select "My Account" from the menu.
Scroll down to the "API" section and click "Create New API Token". This will download a file
kaggle.json with the following contents:
opendatsets.download, you will be asked to enter your username & Kaggle API, which you can get from the file downloaded in step 2.
Note that you need to download the
kaggle.json file only once. On Google Colab, you can also upload the
kaggle.json file using the files tab, and the credentials will be read automatically.
!pip install opendatasets --upgrade --quiet
import opendatasets as od dataset_url = 'https://www.kaggle.com/splcher/animefacedataset' od.download(dataset_url)
4%|▍ | 17.0M/395M [00:00<00:02, 173MB/s]
Downloading animefacedataset.zip to ./animefacedataset
100%|██████████| 395M/395M [00:01<00:00, 280MB/s]
The dataset has a single folder called
images which contains all 63,000+ images in JPG format.
import os DATA_DIR = './animefacedataset' print(os.listdir(DATA_DIR))
['14791_2006.jpg', '15606_2006.jpg', '37864_2012.jpg', '61376_2018.jpg', '51585_2015.jpg', '1505_2001.jpg', '42640_2013.jpg', '54069_2016.jpg', '48489_2014.jpg', '388_2000.jpg']
Let's load this dataset using the
ImageFolder class from
torchvision. We will also resize and crop the images to 64x64 px, and normalize the pixel values with a mean & standard deviation of 0.5 for each channel. This will ensure that pixel values are in the range
(-1, 1), which is more convenient for training the discriminator. We will also create a data loader to load the data in batches.
from torch.utils.data import DataLoader from torchvision.datasets import ImageFolder import torchvision.transforms as T
image_size = 64 batch_size = 128 stats = (0.5, 0.5, 0.5), (0.5, 0.5, 0.5)
train_ds = ImageFolder(DATA_DIR, transform=T.Compose([ T.Resize(image_size), T.CenterCrop(image_size), T.ToTensor(), T.Normalize(*stats)])) train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=3, pin_memory=True)
Let's create helper functions to denormalize the image tensors and display some sample images from a training batch.
import torch from torchvision.utils import make_grid import matplotlib.pyplot as plt %matplotlib inline
def denorm(img_tensors): return img_tensors * stats + stats
def show_images(images, nmax=64): fig, ax = plt.subplots(figsize=(8, 8)) ax.set_xticks(); ax.set_yticks() ax.imshow(make_grid(denorm(images.detach()[:nmax]), nrow=8).permute(1, 2, 0)) def show_batch(dl, nmax=64): for images, _ in dl: show_images(images, nmax) break
!pip install jovian --upgrade --quiet
To seamlessly use a GPU, if one is available, we define a couple of helper functions (
to_device) and a helper class
DeviceDataLoader to move our model & data to the GPU, if one is available.
def get_default_device(): """Pick GPU if available, else CPU""" if torch.cuda.is_available(): return torch.device('cuda') else: return torch.device('cpu') def to_device(data, device): """Move tensor(s) to chosen device""" if isinstance(data, (list,tuple)): return [to_device(x, device) for x in data] return data.to(device, non_blocking=True) class DeviceDataLoader(): """Wrap a dataloader to move data to a device""" def __init__(self, dl, device): self.dl = dl self.device = device def __iter__(self): """Yield a batch of data after moving it to device""" for b in self.dl: yield to_device(b, self.device) def __len__(self): """Number of batches""" return len(self.dl)
Based on where you're running this notebook, your default device could be a CPU (
torch.device('cpu')) or a GPU (
device = get_default_device() device
We can now move our training data loader using
DeviceDataLoader for automatically transferring batches of data to the GPU (if available).
train_dl = DeviceDataLoader(train_dl, device)
The discriminator takes an image as input, and tries to classify it as "real" or "generated". In this sense, it's like any other neural network. We'll use a convolutional neural networks (CNN) which outputs a single number output for every image. We'll use stride of 2 to progressively reduce the size of the output feature map.
import torch.nn as nn
discriminator = nn.Sequential( # in: 3 x 64 x 64 nn.Conv2d(3, 64, kernel_size=4, stride=2, padding=1, bias=False), nn.BatchNorm2d(64), nn.LeakyReLU(0.2, inplace=True), # out: 64 x 32 x 32 nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1, bias=False), nn.BatchNorm2d(128), nn.LeakyReLU(0.2, inplace=True), # out: 128 x 16 x 16 nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1, bias=False), nn.BatchNorm2d(256), nn.LeakyReLU(0.2, inplace=True), # out: 256 x 8 x 8 nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1, bias=False), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), # out: 512 x 4 x 4 nn.Conv2d(512, 1, kernel_size=4, stride=1, padding=0, bias=False), # out: 1 x 1 x 1 nn.Flatten(), nn.Sigmoid())
Note that we're using the Leaky ReLU activation for the discriminator.
Different from the regular ReLU function, Leaky ReLU allows the pass of a small gradient signal for negative values. As a result, it makes the gradients from the discriminator flows stronger into the generator. Instead of passing a gradient (slope) of 0 in the back-prop pass, it passes a small negative gradient. - Source
Just like any other binary classification model, the output of the discriminator is a single number between 0 and 1, which can be interpreted as the probability of the input image being real i.e. picked from the original dataset.
Let's move the discriminator model to the chosen device.
discriminator = to_device(discriminator, device)
The input to the generator is typically a vector or a matrix of random numbers (referred to as a latent tensor) which is used as a seed for generating an image. The generator will convert a latent tensor of shape
(128, 1, 1) into an image tensor of shape
3 x 28 x 28. To achive this, we'll use the
ConvTranspose2d layer from PyTorch, which is performs to as a transposed convolution (also referred to as a deconvolution). Learn more
latent_size = 128
generator = nn.Sequential( # in: latent_size x 1 x 1 nn.ConvTranspose2d(latent_size, 512, kernel_size=4, stride=1, padding=0, bias=False), nn.BatchNorm2d(512), nn.ReLU(True), # out: 512 x 4 x 4 nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1, bias=False), nn.BatchNorm2d(256), nn.ReLU(True), # out: 256 x 8 x 8 nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1, bias=False), nn.BatchNorm2d(128), nn.ReLU(True), # out: 128 x 16 x 16 nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1, bias=False), nn.BatchNorm2d(64), nn.ReLU(True), # out: 64 x 32 x 32 nn.ConvTranspose2d(64, 3, kernel_size=4, stride=2, padding=1, bias=False), nn.Tanh() # out: 3 x 64 x 64 )
We use the TanH activation function for the output layer of the generator.
"The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling." - Source
Note that since the outputs of the TanH activation lie in the range
[-1,1], we have applied the similar transformation to the images in the training dataset. Let's generate some outputs using the generator and view them as images by transforming and denormalizing the output.
xb = torch.randn(batch_size, latent_size, 1, 1) # random latent tensors fake_images = generator(xb) print(fake_images.shape) show_images(fake_images)
torch.Size([128, 3, 64, 64])