A system is desired that will learn behaviours autonomously without any explicit models or instructions. The system will use observations of natural interactions between humans as training data to acquire such models in a passive way. The natural interactions will take place in a constrained scenario to maintain tractability.
The type of learning that will be acquired will be imitation learning to simulate the kind of behaviour patterns that are observed in others. The system is to learn how to interact with humans by imitating prototypical reactions to their actions and learning mappings of appropriate interactive behaviour.
The system will use perception and generate perceptual output which will be the channel through which it performs its acquisition of behaviour. By sensing external stimuli, it should acquire behaviours at a perceptual level and ground them in physical and expressive manifestations (i.e. vision sensing and graphics synthesis).
Automatic learning is desired without significant data engineering or manual specification of models. There should not be external intervention or direct supervision beyond the specification of inputs and output features. Rather, the system should learn only from constrained yet natural human to human interactions without explicit teaching effort.
There will not be a discrete set of actions and reactions to facilitate behaviour selection and associative learning. Input will be in a continuous noisy space where event will seldom be observed exactly the same way twice. Thus, exact deterministic models or if-then types of associations which assist the learning process can not work.
A fully probabilistic model of the conditional probability over possible continuous reactions given continuous actions is desired. Thus, the system can have some natural stochastic behaviour as opposed to behaving in a fully predictable manner.
The system is to use the acquired behaviours to interact with other humans in real-time without excessively encumbering the participants. In other words, the humans will engage the system in a natural way without extra paraphernalia and without excessively constraining their behaviour to accommodate the system. In addition, the system must compute the desired action to take and manifest it in some output form in an interactive manner.
The behaviour synthesized has to agree with the acquired behaviours from observations of human participants. An action on the user's behave should induce a sensible reaction on the behalf of the system. The behaviour being synthesized should generalize to interaction with different users and show some robustness to changes in operating conditions.
There will be a minimal amount of structure on the system's learning models. The structures being used for learning should be generic enough to uncover temporal interaction in other phenomena that are not only limited to the particular training scenario.
The system must learn quickly with limited time and computational resources. The system does not have the luxury of exploring a huge space by trial and error or exhaustive search. In addition, it must learn from few examples without requiring large redundancies and many instances of training data.