Bounding Matrices: The Gate Covariances

Having shown the update equation for moving each gate's mean, we now turn our attention to the gate's covariance. Recall that each gating function was an unnormalized Gaussian as in Equation 7.14. Gaussians have no constraints on their mean which can be any real vector. Thus, it was possible to just bound the Q function with parabolas for the mean variable. However, Gaussians have constraints on their covariances which must remain symmetric and positive semi-definite. Thus, we can not use simple parabolic bounds trivially over all the parameters of the $\Sigma_{xx}^m$ matrices. Maximizing a parabola might return parameters for $\Sigma_{xx}^m$ which don't maintain the above constraints. Thus, we explicitly bound the Q function with the logarithm of a single Gaussian density. Maximizing a Gaussian (as in maximum likelihood type calculations) is guaranteed to yield a symmetric positive definite matrix. Thus, we shall use it as a bound as in Equation 7.27. For compactness, we are merely writing $\Sigma_{xx}^m$ as $\Sigma$ .

$\displaystyle \begin{array}{ccc} Q(\Theta^t,\Theta^{(t-1)})_{im} & \geq & \log(... ...\:\:\: \Longrightarrow {\rm equality \:\: at \:\:} \Sigma_{xx}^{m*} \end{array}$

(7.27)

Once again, we assume the data has been appropriately whitened with respect to the gate's previous parameters. Without generality, therefore, we can transform the data and assume that the previous value of the mean is 0 and the covariance is identity. We again solve for the parameters of the log-Gaussian bound using the three requirements (contact, tangentiality and tightness) and obtain the k_im, h_im, w_im as in Equation 7.28.

$\displaystyle \begin{array}{lll} k_{im} & = & {\hat h}_{im} ( \log {\cal N} ({\... ...}{\bf x}_i) } { D - tr(\Sigma^{-1}) + \log \vert \Sigma^{-1} \vert} \end{array}$

(7.28)

As before, we would like an efficient way of solving for the minimal value of w_im to get the tightest possible bound. In the definition, w_im is lower bounded by a function of the covariance matrix which may contain very many parameters. We wish to find the maximum of this function (call it g) to find the minimum valid w_im (which forms the tightest bound). Instead of exhaustively searching for the maximum of g by varying the many parameters of $\Sigma$ , we note a few simplifications. First, the covariance matrix can be written in terms of eigenvalues and eigenvectors $\Sigma^{-1} = V^T \Lambda V$ . We also note that the denominator does not rely on the eigenvectors and can be written as in Equation 7.29.

$\displaystyle \begin{array}{lll} g & = & \frac{ \frac{1}{2} \exp(-\frac{1}{2} {... ...Lambda V {\bf x}_i) } { D - tr(\Lambda) + \log \vert \Lambda \vert} \end{array}$

(7.29)

The numerator of g can thus use V to further optimize gindependently of the denominator. As a consequence, one can vary the eigenvectors to select any eigenvalue in $\Lambda$ by rotating the basis. Thus, the numerator only depend on the minimal and maximal values of the eigenvalue matrix ( $\lambda_{min}$ and $\lambda_{max}$ ). The denominator can then freely vary all other eigenvalues in between without affecting the numerator's range. Thus, we only need to search over different values of $\lambda_{min}$ and $\lambda_{max}$ or two parameters. Using this and a further constraint which forces either $\lambda_{min}$ or $\lambda_{max}$ to be 1, we end up with a g function which is only a function of the magnitude of the data point $\rho={{\bf x}_i}^T{\bf x}_i$ and a single free scalar parameter $\lambda$ as in Equation 7.30. This result was verified with numerical tests as well.

$\displaystyle \begin{array}{lll} w_{im} & \geq & r_i \alpha_m g(\rho,\lambda) \\ wmin_{im} & = & r_i \alpha_m \max \{ g(\rho,\lambda) \end{array}$

(7.30)

Given a value of $\rho$ from observed data, we need to find the best wmin_im by checking all the values of $\lambda$ . In an offline process which requires a few seconds, we form a table of wmin_imfor each $\rho$ using Brent's method at each point. Thus, using this 1D lookup table, we can immediately compute the width of the bound given a data point. The values are shown in Figure 7.5 where we plot the max of the g function with respect to $\rho$ . Using this table, we can immediately compute the minimum wmin_im by looking up the corresponding value of $\rho$ for the current data point and subsequently compute the rest of the bound's parameters.

**Figure 7.5:** Look-Up Table for Covariance Bound
$\begin{figure}\center \begin{tabular}[b]{c} \epsfxsize=2.4in \epsfbox{COVLUT.ps} \end{tabular} \end{figure}$

Some examples of a log-Gaussian bound are shown in Figure 7.6 under different sub-components of the Qfunction. Each sub-component corresponds to a single data point as we vary one gate's uni-variate covariance. Many such parabolic bounds are computed (one for each data point and gate combination) and are summed to bound the Q function in its entirety.

**Figure 7.6:** Covariance Bounding Examples
$\begin{figure}\center \begin{tabular}[b]{ccc} \epsfxsize=1.5in \epsfbox{covbo... ...2.ps} & \epsfxsize=1.5in \epsfbox{covbounder3.ps} \end{tabular} \end{figure}$

To obtain a final answer for the update of the gate covariances $\Sigma_{xx}^m$ we simply maximize the sum of log Gaussians which bounds the Q in its entirety. For M gates and N data points, we obtain $M \times N$ bounds each with its own ( $wmin_{im},k_{im},\gamma_{im}$ ). The update is in Equation 7.31.

$\displaystyle \begin{array}{lll} \Sigma_{xx}^m & := & \frac{ \sum_{i=1}^N wmin_{im} \gamma_{im} {\gamma_{im}}^T}{\sum_{i=1}^N wmin_{im}} \end{array}$

(7.31)

This covariance is then unwhitened by inverting the transform used to move the data. This undoes the whitening operation which allowed us to transform data and consequently simplify intermediate calculations .