It is prudent to train the ARL system in a constrained context to achieve some kind of learning convergence from limited data and limited modeling resources. Thus, the users involved in training the system initially are given some loose instructions on the nature of the interactions they will be performing. The users were given the instructions listed in Table 9.1.
The two users (A and B) begin by playing the above game and A gestures while B responds appropriately. The users are physically separated from each other and only see graphical representations of each other on their screens. The learning algorithm is given measurements of the head and hand positions of both users. These measurements are taken off of the players for several minutes of interaction. These sequences generate many input-output pairs . The data pairs are used to train the system which is then able to impersonate player B. Once the training is complete, the B gesturer leaves and the single user remaining is A. The screen display for A still shows the same graphical character except now the actions of the character are synthesized by the ARL system as opposed to the other player.
More specifically, the training process involved between 5 to 10 of each of the above interactions and lasted roughly 5 minutes. This accounts for slightly over 5000 observations of the 30 dimensional vectors. Each of these form an where the was the eigen-representation of the past short term memory over T exponentially decayed samples. The dimensionality of was only 22 and the short term memory was over 120 samples (T=120 or over 6 seconds). The system used 25 Gaussians for the pdf. The limitations on dimensionality and number of Gaussians where mainly for speed considerations. The learning (CEM algorithm) took approximately 2 hours to converge on an SGI OCTANE for the 5 minute training sequence of interactions. An annealing schedule of exponential decay with a per iteration was used. If no annealing is used, an inferior (and less global) solution can be obtained in well under one hour. At the end of the annealed training (roughly 400 iterations) the conditional log-likelihood was approximately 25.7. Figure 9.1 shows the convergence of the annealed CEM algorithm.