We propose the computation of limb length as opposed to limb area since a mouth can be extremely thin when closed. However, the mouth usually has a significant horizontal length. Recall, now, the kernel used to perform the limb extraction. The kernel favored horizontal displacement over diagonal displacement and scale change. Similarly, as we compute limb length, we weight the computation of limb length depending on its degree of undulation (in the spatial and scale domains). Thus, a set of limb lengths are computed and attenuated by the degree of undulation of the limb axis, as shown in the kernel in Figure . For each limb, limbi, we compute the weighted length of the limb, di. We threshold this value so that limbs that are extremely short are discarded.
We also wish to determine the intensity variance of the isolated limb and compare it to the tone of the face. The lips, the interior of the mouth and the teeth are either brighter or darker than the surrounding skin. Thus, the intensity values enclosed by the mouth limb should have a significantly different intensity from the average intensity of the face. We compute the mean intensity of the skin, mface, by averaging the intensity values below the eyes and within the facial contour. We then compute the variance in intensity at each pixel as shown in Figure . The overall average intensity variance of the face within the facial contour is . The intensity variance of the region defined by each limb, , is then computed. If a limb has less variance than the average variance of the face then it does not exhibit any significant contrast or stand out strongly from the rest of the face. Such limbs cannot be mouths and are discarded.
The mouth is selected as the limb with the highest product . The center point of the strongest limb (the locus of the mouth) is displayed superimposed on the variance image in Figure . Since we know the trajectory of the limb axis and the thickness of the limb, we can directly compute the outline of the mouth, which is shown in Figure .
Instead of explicitly defining a geometric model of the mouth that is sensitive to multi-dimensional deformability, we have used a simple ``definition'' to localize the object of interest. Simply stated, we find the mouth as the longest horizontally symmetric limb, with a simple axis and significant intensity variance from the surrounding skin tone.
If we fail to find any limbs that are long enough or have a higher variance than the face as a whole, then no mouth has been detected and we return to the eye stage to investigate another pair of interest peaks.
A successful mouth point is found when the limb with the highest product passes a threshold on diand a threshold on . The locus of the mouth is then stored and we can proceed to the nose localization stage.