Mixture density networks
>Gaussian function (not all distributions are unimodal gaussian function)
>Visualizing model fitting with multi output data
>Probability distribution function
>Condititional probability
>
>MDN intuition and training
>inference sampling
Conditional Probability
Conditional probability is the probability of one event occurring with some relationship to one or more other events. For example:
Event A is that it is raining outside, and it has a 0.3 (30%) chance of raining today.
Event B is that you will need to go outside, and that has a probability of 0.5 (50%).
A conditional probability would look at these two events in relationship with one another, such as the probability that it is both raining and you will need to go outside.
The formula for conditional probability is:
P(B|A) = P(A and B) / P(A)
which you can also rewrite as:
P(B|A) = P(A∩B) / P(A)
-> 0.3*0.5 / 0.3 = 0.5 (50%)
How do Mixture Density RNNs Predict the Future?
https://arxiv.org/pdf/1901.07859.pdf
In practice, a mixture density network (MDN) operates by
transforming the outputs of a neural network to form the parameters of a mixture distribution (Bishop, 1994), generally
with Gaussian models for each mixture component. These
parameters are the centres (µ) and scales (σ) for each Gaussian component, as well as a weight (π) for each component
(see Figure 1). The MDN usually uses an exponential activation function to transform the scale parameters to be positive
and non-zero. For training, the probability density function
of the mixture model is used to generate the negative log
likelihood for the loss function. This involves constructing
probability density functions (PDFs) for each Gaussian component and categorical distribution from the mixture weights
(see Appendix Section 1.4 for details). One advantage of an
MDN is that various component distributions can be used
so long as the PDF is tractable, for instance, 1D (Bishop,
- or 2D (Graves, 2013) Gaussian distributions, or, as in
our case, a multivariate Gaussian with a diagonal covariance
matrix.
For inference, results are sampled from the mixture distribution. First, the πs are used to form a categorical distribution
by applying the softmax function. A sample is drawn from
this distribution to determine which Gaussian component
will provide the output. The index i of the sampled π is used
to select a Gaussian distribution, N (µi
, σ2
i
), from which a
sample is drawn to provide the outcome. In some cases,
it is advantageous to adjust the diversity of sampling (for
instance, to favour unlikely predictions), in which case the
temperature of the categorical distribution can be adjusted in
the typical way, and the covariance matrices of the Gaussian
components may be scaled. We refer the these operations
as adjusting π- or σ-temperature respectively.
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import math