Next: Modularity Up: An Integrated Learning and Previous: An Integrated Learning and

System Components and Integration

Recall the formalism that was established in the previous chapters. In the discussion of time series processing, a vector notation was employed describing past short term memory as ${\bf x}(t)$ and measurements (future) of both users as ${\bf y}(t)$ . We re-use this notation and assume that two users (user A and user B) can be connected to the ARL system. The two corresponding vision systems independently recover ${\bf y}_A(t)$ and ${\bf y}_B(t)$ which are fed through the learning system to two graphics systems in real-time. We enumerate the functionality of each component in the system in terms of this notation.

Perception - Vision
Generates for user a vector of instantaneous perceptual measurements. The vision system on user A generates ${\bf y}_A(t)$ and the vision on user B generates ${\bf y}_B(t)$ .
Output - Graphics
Synthesizes graphically a vector of perceptual measurements, for example either ${\bf y}_A(t)$ or ${\bf y}_B(t)$ .
Time-Series Memory
Accumulates both the actions of user A and user B by concatenating ${\bf y}_A(t)$ and ${\bf y}_B(t)$ together into ${\bf y}(t)$ . Also the module stores many of these vectors, $\{{\bf y}(t-1),{\bf y}(t-1),...,{\bf y}(t-T)\}$ into a short term memory. The unit then pre-process the short term memory with decay and dimensionality reduction to form a compact vector ${\bf x}(t)$ .
Learning
Learns from the past short term memory ${\bf x}(t)$ and the immediate subsequent vector of measurements from both users ${\bf y}(t)$ (generated by user A and B). Using the CEM machinery, many such pairs of $\{{\bf x}(t),{\bf y}(t)\}$ from a few minutes of interaction form a probability density $p({\bf y}\vert{\bf x})$ . We can use this model to compute a predicted ${\hat {\bf y}}(t)$ , the immediate future, for any observed ${\bf x}(t)$ short term past.

In summary, the vision systems (one per user) both produce the components ${\bf y}_A(t)$ and ${\bf y}_B(t)$ which are concatenated into ${\bf y}(t)$ . This forms a stream which is fed into some temporal representation. Recall that an accumulation of the past ${\bf y}(t-1),...,{\bf y}(t-T)$ was denoted Y(t). It was then processed with decay and dimensionality reduction to generate ${\bf x}(t)$ . Then, a learning system is used to learn the conditional density $p({\bf y}(t)\vert{\bf x}(t))$ from many example pairs of $({\bf x}(t),{\bf y}(t))$ . This allows it to later compute an estimate ${\hat {\bf y}}(t)$ given the current ${\bf x}(t)$ . Finally, for synthesis, the estimate is broken down into ${\hat {\bf y}}_A(t)$ and ${\hat {\bf y}}_B(t)$ which are predictions for each user. However, it is critical to note that the estimate into the future ${\hat {\bf y}}(t)$ is an instantaneous estimate and, on its own, does not generate any finite length, meaningful action or gesture.

Next: Modularity Up: An Integrated Learning and Previous: An Integrated Learning and

Tony Jebara
1999-09-15