We are investigating alternate ways of computing the WP3P to obtain a more accurate alignment of the 3D model. As was shown in the sensitivity analysis, the recognition is too sensitive to the nose localization. A 3D normalization which depends less upon this anchor point might improve recognition results.
Alternativeely, we can improve localization accuracy. Some proposed techniques include the use of eigenspace analysis of the mug-shot to determine normalization and localization errors directly. In other words, we are investigating a way to compute a DFFS vector or gradient as opposed to a simple DFFS scalar value. This would allow the recognition to detect and report the orientation of small localization errors to the normalization stage. Thus, a gradient descent could be used to optimize the localization. Furthermore, we also have the option of performing more DFFS measurements in the neighbourhood around the initial localization from the face detection stage. The sample DFFS measurements could be used to approximate a vector or DFFS gradient which could then be used by a gradient descent optimization to quickly converge to a superior localization.
We are also considering preceding the recognition stage with some transforms to desensitize it to small errors in the localization stage. Preceding the KL transform with a Fourier, discrete cosine or wavelet transform will yield some small spatial invariance which might generate superior overall recognition rates. In addition, the size of the synthesized mug-shots could be increased so that more pixels are used to represent a face. Such higher resolution mug-shots could yield superior recognition rates. We are also investigating possible ways to group faces into categories (race, facial hair, sex, etc.) to generate specialized eigenspaces for each class. This would yield more accurate recognition since eigenvectors would be dedicated to the members of the group (i.e. eigenfaces for bearded men, eigenfaces for women, etc.) Finally, we are testing the algorithm's recognition rates for multiple training images per individual (instead of only one training image per individual).
The algorithm could be optimized to perform face tracking in real-time. This would allow the face to be used as a control device for the handicapped or for gaze-detection-based virtual reality. Furthermore, the algorithm could be used for low-bandwidth video conferencing applications since the face data is compressed to 60 scalar coefficients using the KL decomposition. Finally, we propose the use of the algorithm in interactive 3D movies since the 3D normalization allows the user to view many different poses of an individual from a single original photograph (or video stream). These are just a few of the many applications of face recognition, detection and normalization.