Another important concept also arises that is equally well supported by the framework: intra-modal learning. By intra-modal learning, we are referring to the acquisition of a coupling between gestures or actions in one domain with those of another domain. For instance, User A could feed the system with audio measurements (pitch energy, textural properties) while User B could feed the system with visual gestures. As the system learns how to predict the time-series of both measurements, it begins to form a mapping between audio and video. Thus, instead of learning that a clap should follow when the user rubs his stomach (as demonstrated above), the system could trigger clapping when the user sings a nice melody into a pitch tracker. Evidently, there are many unresolved issues here but the important notion to stress is that User A and User B do not have to have the same type of measurements. In other words, their respective measurements ( and ) could contain observations of different types of data.