During runtime, we need to quickly generate an output given the input . Observing an value turns our conditional model into effectively a marginal density over (i.e. ). The observed makes the gates act merely as constants, Gm, instead of as Gaussian functions. In addition, the conditional Gaussians which were original experts become ordinary Gaussians when we observe and the regressor term becomes a simple mean . If we had a conditioned mixture of M Gaussians, the marginal density that results is an ordinary sum of M Gaussians in the space of as in Equation 7.33.
Observe the 1D distribution in Figure 7.12. At this point, we would like to choose a single candidate from this distribution. There are many possible strategies for performing this selection with varying efficiencies and advantages. We consider and compare the following three approaches. One may select a random sample from , one may select the average or one may compute the with the highest probability.
Sampling will often return a value which has a high probability however, it may sometimes return low values due to its inherent randomness. The average, i.e. the expectation, is a more consistent estimate but if the density is multimodal with more than one significant peak, the value returned might actually have low [5] 7.2 (as is the case in Figure 7.12). Thus, if we consistently wish to have a response with high probability, the best candidate is the highest peak in the marginal density, i.e. the arg max.