In Figure 8.3 one can see the system as it
synthesizes interactive behaviour with a single user. User A is given
the illusion of interacting with user B through the synthesis of the
ARL system. The vision system on A still takes measurements and these
integrate and feed the learning system. However, the output of the
learning system is also fed back into the short term memory. It
fills in the missing component (user B) inside .
Thus, not
only does user A see synthesized output, continuity is maintained by
feeding back synthetic measurements. This gives the system the ability
to see its own actions and maintain self-consistent behaviour. The
half of the time series that used to be generated by B is now being
synthesized by the ARL system. The
is continuously
updated allowing good estimates of
.
In fact, the ARL
prediction only computes small steps into the future and these deltas
do not amount to anything on their own unless integrated and
accumulated. Since the attentional window which integrates the
measurements is longer than a few seconds, this gives the system
enough short term memory to maintain consistency over a wide range of
gestures and avoids instability.
Of course, for the initial few seconds of interaction, the system has
not synthesized any actions and user A has yet to gesture. Thus, there
are no
vectors and no accumulated short term
memory. Therefore, the unobserved first few seconds of the time series
are set to reasonable default values. The system eventually bootstraps
and stabilizes when a user begins interacting with it and output is
fed back.
Simultaneously, the real-time graphical blob representation is used to
un-map the predicted perceptual (the
action)
for the visual display. It is through this display that the human user
receives feedback in real-time from the system's reactions. This is
necessary to maintain the interaction which requires the human user to
pay attention and respond appropriately. For the head and hand
tracking case, the graphics system is kept simple and merely renders
blobs in a 1 to 1 mapping. It displays to the user only what it
perceives (three blobs). The ARL system's primary concern is head and
hand position and this becomes clear from the coarse display. In
addition, the user is not as misled into expecting too much
intelligence from such a simple output.