Learn practical skills, build real-world projects, and advance your career
5 years ago

Mixture density networks

>Gaussian function (not all distributions are unimodal gaussian function)

>Visualizing model fitting with multi output data

>Probability distribution function

>Condititional probability

>

>MDN intuition and training

>inference sampling

Conditional Probability

Conditional probability is the probability of one event occurring with some relationship to one or more other events. For example:

Event A is that it is raining outside, and it has a 0.3 (30%) chance of raining today.
Event B is that you will need to go outside, and that has a probability of 0.5 (50%).
A conditional probability would look at these two events in relationship with one another, such as the probability that it is both raining and you will need to go outside.

The formula for conditional probability is:
P(B|A) = P(A and B) / P(A)
which you can also rewrite as:
P(B|A) = P(A∩B) / P(A)

-> 0.3*0.5 / 0.3 = 0.5 (50%)

How do Mixture Density RNNs Predict the Future?

https://arxiv.org/pdf/1901.07859.pdf

In practice, a mixture density network (MDN) operates by
transforming the outputs of a neural network to form the parameters of a mixture distribution (Bishop, 1994), generally

with Gaussian models for each mixture component. These
parameters are the centres (µ) and scales (σ) for each Gaussian component, as well as a weight (π) for each component
(see Figure 1). The MDN usually uses an exponential activation function to transform the scale parameters to be positive
and non-zero. For training, the probability density function
of the mixture model is used to generate the negative log
likelihood for the loss function. This involves constructing
probability density functions (PDFs) for each Gaussian component and categorical distribution from the mixture weights
(see Appendix Section 1.4 for details). One advantage of an
MDN is that various component distributions can be used
so long as the PDF is tractable, for instance, 1D (Bishop,

  1. or 2D (Graves, 2013) Gaussian distributions, or, as in
    our case, a multivariate Gaussian with a diagonal covariance
    matrix.
    For inference, results are sampled from the mixture distribution. First, the πs are used to form a categorical distribution
    by applying the softmax function. A sample is drawn from
    this distribution to determine which Gaussian component
    will provide the output. The index i of the sampled π is used
    to select a Gaussian distribution, N (µi
    , σ2
    i
    ), from which a
    sample is drawn to provide the outcome. In some cases,
    it is advantageous to adjust the diversity of sampling (for
    instance, to favour unlikely predictions), in which case the
    temperature of the categorical distribution can be adjusted in
    the typical way, and the covariance matrices of the Gaussian
    components may be scaled. We refer the these operations
    as adjusting π- or σ-temperature respectively.
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import math