Next: Expectation and Arg Max Up: CEM - A Maximum Previous: Conditional Constraints vs. Joint

Applying the Model

Assume we have a fully optimized conditional model of the form $p({\bf y}\vert{\bf x})$ from many examples of data $({\bf x}_i,{\bf y}_i)$ through the use of a an algorithm such as CEM. Our typical scenario is to use this model and its estimated parameters to predict an output response given an input observation.

During runtime, we need to quickly generate an output ${\bf y}$ given the input ${\bf x}$ . Observing an ${\bar {\bf x}}$ value turns our conditional model $p({\bf y}\vert{\bf x})$ into effectively a marginal density over ${\bf y}$ (i.e. $p({\bf y})$ ). The observed ${\bar {\bf x}}$ makes the gates act merely as constants, G_m, instead of as Gaussian functions. In addition, the conditional Gaussians which were original experts become ordinary Gaussians when we observe ${\bf x}$ and the regressor term ${nu^m + \Gamma^m {\bf x}}$ becomes a simple mean $\mu^m$ . If we had a conditioned mixture of M Gaussians, the marginal density that results is an ordinary sum of M Gaussians in the space of ${\bf y}$ as in Equation 7.33.

$\displaystyle \begin{array}{lll} p({\bf y}\vert{\bf x} , \Theta) & = & \frac{ \... ...M G_m \times {\cal N} ({\bf y};\mu^m,\Omega^m)} {\sum_{n=1}^M G_n } \end{array}$

(7.33)

Observe the 1D distribution in Figure 7.12. At this point, we would like to choose a single candidate ${\hat {\bf y}}$ from this distribution. There are many possible strategies for performing this selection with varying efficiencies and advantages. We consider and compare the following three approaches. One may select a random sample from $p({\bf y})$ , one may select the average ${\bf y}$ or one may compute the ${\bf y}$ with the highest probability.

$\begin{figure}% latex2html id marker 5435 \center \begin{tabular}[b]{c} \epsfx... ...types] {Output Probability Distribution, Expectation and Prototypes}\end{figure}$

Sampling will often return a value which has a high probability however, it may sometimes return low values due to its inherent randomness. The average, i.e. the expectation, is a more consistent estimate but if the density is multimodal with more than one significant peak, the ${\bf y}$ value returned might actually have low $p({\bf y})$ [5] ^7.2 (as is the case in Figure 7.12). Thus, if we consistently wish to have a response ${\hat {\bf y}}$ with high probability, the best candidate is the highest peak in the marginal density, i.e. the arg max.

Next: Expectation and Arg Max Up: CEM - A Maximum Previous: Conditional Constraints vs. Joint

Tony Jebara
1999-09-15