Learn practical skills, build real-world projects, and advance your career
%matplotlib inline

DCGAN for Fake Face Generation

Introduction

Generative Adversarial Networks

What is a GAN?


GANs are a framework for teaching a DL model to capture the training
data’s distribution so we can generate new data from that same
distribution. GANs were invented by Ian Goodfellow in 2014 and first
described in the paper `Generative Adversarial
Nets <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`__.
They are made of two distinct models, a *generator* and a
*discriminator*. The job of the generator is to spawn ‘fake’ images that
look like the training images. The job of the discriminator is to look
at an image and output whether or not it is a real training image or a
fake image from the generator. During training, the generator is
constantly trying to outsmart the discriminator by generating better and
better fakes, while the discriminator is working to become a better
detective and correctly classify the real and fake images. The
equilibrium of this game is when the generator is generating perfect
fakes that look as if they came directly from the training data, and the
discriminator is left to always guess at 50% confidence that the
generator output is real or fake.

Now, lets define some notation to be used throughout tutorial starting
with the discriminator. Let x be data representing an image.
D(x) is the discriminator network which outputs the (scalar)
probability that x came from training data rather than the
generator. Here, since we are dealing with images the input to
D(x) is an image of CHW size 3x64x64. Intuitively, D(x)
should be HIGH when x comes from training data and LOW when
x comes from the generator. D(x) can also be thought of
as a traditional binary classifier.

For the generator’s notation, let z be a latent space vector
sampled from a standard normal distribution. G(z) represents the
generator function which maps the latent vector z to data-space.
The goal of G is to estimate the distribution that the training
data comes from (Pdata) so it can generate fake samples from
that estimated distribution (Pg).

So, D(G(z)) is the probability (scalar) that the output of the
generator G is a real image. As described in `Goodfellow’s
paper <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`__,
D and G play a minimax game in which D tries to
maximize the probability it correctly classifies reals and fakes
(logD(x)), and G tries to minimize the probability that
D will predict its outputs are fake (log(1-D(G(x)))).

In theory, the solution to this minimax game is where
Pg = Pdata, and the discriminator guesses randomly if the
inputs are real or fake. However, the convergence theory of GANs is
still being actively researched and in reality models do not always
train to this point.

What is a DCGAN?

A DCGAN is a direct extension of the GAN described above, except that it
explicitly uses convolutional and convolutional-transpose layers in the
discriminator and generator, respectively. It was first described by
Radford et. al. in the paper Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks <https://arxiv.org/pdf/1511.06434.pdf>. The discriminator
is made up of strided
convolution <https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d>

layers, batch norm <https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm2d>__
layers, and
LeakyReLU <https://pytorch.org/docs/stable/nn.html#torch.nn.LeakyReLU>__
activations. The input is a 3x64x64 input image and the output is a
scalar probability that the input is from the real data distribution.
The generator is comprised of
convolutional-transpose <https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d>__
layers, batch norm layers, and
ReLU <https://pytorch.org/docs/stable/nn.html#relu>__ activations. The
input is a latent vector, z, that is drawn from a standard
normal distribution and the output is a 3x64x64 RGB image. The strided
conv-transpose layers allow the latent vector to be transformed into a
volume with the same shape as an image. In the paper, the authors also
give some tips about how to setup the optimizers, how to calculate the
loss functions, and how to initialize the model weights, which
will be explained in the coming projects.

from __future__ import print_function
#%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

# Set random seed for reproducibility
manualSeed = 999
#manualSeed = random.randint(1, 10000) # use if you want new results
print("Random Seed: ", manualSeed)
random.seed(manualSeed)
torch.manual_seed(manualSeed)
Random Seed: 999
<torch._C.Generator at 0x7f2fad0741c8>

Inputs

Let’s define some inputs for the run:

  • dataroot - the path to the root of the dataset folder. We will
    talk more about the dataset in the next section
  • workers - the number of worker threads for loading the data with
    the DataLoader
  • batch_size - the batch size used in training. The DCGAN paper
    uses a batch size of 128
  • image_size - the spatial size of the images used for training.
    This implementation defaults to 64x64. If another size is desired,
    the structures of D and G must be changed. See
    here <https://github.com/pytorch/examples/issues/70>__ for more
    details
  • nc - number of color channels in the input images. For color
    images this is 3
  • nz - length of latent vector
  • ngf - relates to the depth of feature maps carried through the
    generator
  • ndf - sets the depth of feature maps propagated through the
    discriminator
  • num_epochs - number of training epochs to run. Training for
    longer will probably lead to better results but will also take much
    longer
  • lr - learning rate for training. As described in the DCGAN paper,
    this number should be 0.0002
  • beta1 - beta1 hyperparameter for Adam optimizers. As described in
    paper, this number should be 0.5
  • ngpu - number of GPUs available. If this is 0, code will run in
    CPU mode. If this number is greater than 0 it will run on that number
    of GPUs