Next: 3D Face Data for Up: 3D Pose Estimation and Previous: Improving and Filtering Localization

Face Normalization and Recognition

The position of a rigid object can be specified by 6 parameters: 3 rotations and 3 translations. The rigid motion of a face or any object is specified by these 6 parameters. Rigid motion of the face accounts for a great amount of variance in its appearance in a 2D image array. Furthermore, the lighting changes caused by light sources at arbitrary positions and intensities also account for a significant amount of variance. Simultaneously, the non-rigid deformations of the face (from muscular actuation and identity variation) cause more subtle variations in the 2D image. An individual's identity, however, is captured by these small variations alone and is not specified by the variance due to the large rigid body motion and illumination of the face. Thus, it is necessary to compensate or normalize a face for position and illumination so that the variance due to these is minimized. Consequently, the small variations in the image due to identity, muscle actuation and so on will become the dominant source of intensity variance in an image and can thus be analyzed for recognition purposes.

Recall the output of the face detection and localization stage. The eyes, the nose and the mouth were identified using direct image processing techniques. Assume for now that the nose's horizontal position was also determined and an exact locus for the nose tip is available. The detection of the loci of these feature points (eyes, nose and mouth) gives an estimate of the pose of an individual's face. Once the pose or the 3D position and orientation of the face is known, it is possible to invert the effect of translation and rotation and synthesize a standardized, frontal view of the individual. Furthermore, the position of the feature points allows us to roughly segment the contour of the face to discard distracting background information. Once segmented, a histogram of the face alone can be computed to compensate for lighting changes in the image.

We follow the scheme described above to generate a normalized mug-shot image. It is then possible to analyze the variances left in the image using linear statistical techniques. We use these techniques to classify and characterize the variances that remain in the image since they are now constrained and limited. Statistical analysis of the Karhunen-Loeve Decomposition (KL) allows us to verify face detection and to improve feature localization by computing a ``faceness'' measure which quantifies how face-like an image region is. This ``faceness'' measure lets us try different loci for the nose and select the position which maximizes this statistical similarity to a fully frontal face. Finally, the KL-encoded faces are matched to each other using a nearest-neighbour classifier such as the Euclidean distance in the KL-space for recognition.

Next: 3D Face Data for Up: 3D Pose Estimation and Previous: Improving and Filtering Localization

Tony Jebara
2000-06-23