During runtime, we need to quickly generate an output
given
the input
.
Observing an
value turns our
conditional model
into effectively a marginal
density over
(i.e.
). The observed
makes the gates act merely as constants, Gm, instead of as
Gaussian functions. In addition, the conditional Gaussians which were
original experts become ordinary Gaussians when we observe
and the regressor term
becomes a simple
mean
.
If we had a conditioned mixture of M Gaussians, the
marginal density that results is an ordinary sum of M Gaussians in
the space of
as in Equation 7.33.
Observe the 1D distribution in Figure 7.12. At this
point, we would like to choose a single candidate
from this distribution. There are many possible strategies for
performing this selection with varying efficiencies and
advantages. We consider and compare the following three
approaches. One may select a random sample from
,
one may
select the average
or one may compute the
with the
highest probability.
Sampling will often return a value which has a high probability
however, it may sometimes return low values due to its inherent
randomness. The average, i.e. the expectation, is a more consistent
estimate but if the density is multimodal with more than one
significant peak, the
value returned might actually have low
[5] 7.2 (as is the case in
Figure 7.12). Thus, if we consistently wish to have a
response
with high probability, the best candidate is
the highest peak in the marginal density, i.e. the arg max.