Generating the Average 3D Face

**Figure 4.7:** Some of the 3D faces used to form the average 3D face.
$\begin{figure}\center \begin{tabular}[b]{ccc} \epsfig{file=norm/figs/alba.ps,... ...,height=4cm}\\ (a) & (b) & (c) \end{tabular} \\ \vspace*{0.5cm} \end{figure}$

In averaging the 3D faces in a database, we wish to see the mean 3D face converge to a stable structure as we introduce more sample 3D faces. We also expect the mean 3D face to be ``face-like'' in the sense that the averaging process will not smooth out its features to the point where they are no longer distinguishable. In other words, the mean 3D face should still have a nose, a mouth, eyes and so on. If we do not see this convergence and the mean face is a mere blob or ellipsoid, then our hypothesis is incorrect: the 3D structure of a human face is not regular enough to approximate multiple individuals. Another possible source of divergence is inadequate normalization before the averaging process. If the 3D faces in our database are not fully normalized before being averaged, then the mean face will not be face-like.

For each face in our 3D range data database, we manually select 4 points: the left eye, the right eye, the nose and the mouth and note their 3D coordinates. Each model in the database undergoes a 3D transformation with a vertical stretch to map its 4 anchor points to the same destination set of anchor points. Mathematically, the four 3D anchor points: $(\vec{n}_1,\vec{n}_2,\vec{n}_3,\vec{n}_4)$ for each model, are mapped to a destination set of 3D anchor points: $(\vec{m}_1,\vec{m}_2,\vec{m}_3,\vec{m}_4)$ . This mapping is given in Equation

where matrix T is defined as follows:

$\displaystyle \: \: \: \: \: \: \: \: T= \left\{ \begin{small} \begin{array}{cc... ...n\theta_z) & \cos\theta_x \cos\theta_y & t_z \\ \end{array}\end{small}\right\}$

$\displaystyle \begin{array}{cccc} \left\{ \begin{small} \begin{array}{c} x_{f} ... ...}{c} x_{i} \\ y_{i} \\ z_{i} \\ 1 \end{array}\end{small}\right\} \end{array}$

(4.1)

Using ten 3D models, the best transformation matrix was found by optimizing the 7 parameters $(t_{x},t_{y},t{z},\theta_{x},\theta_{y},\theta_{z},s_{y})$ to minimize the fitting error, E_fit as defined in Equation

below. There are 3 translation parameters (t_x,t_y,tz), 3 rotation parameters $(\theta_{x},\theta_{y},\theta_{z})$ and one vertical stretch parameter (s_y):

$\begin{displaymath}E_{fit}=\sum_{i \epsilon \{ 1,2,3,4 \}} \sqrt{(n_{i_x}-m_{i_x})^{2}+(n_{i_y}-m_{i_y})^{2}+(n_{i_z}-m_{i_z})^{2}} \end{displaymath}$

(4.2)

The final average 3D face range model is shown in Figure

. This is the only model that will be rotated, translated and deformed to approximate the structure of new faces and the other 10 database models are now discarded. As can be seen, the 3D mean face is a smooth, face-like structure with distinct features. The coordinates of the features (eyes, nose and mouth) are stored with the 3D model as $(\vec{m}_1,\vec{m}_2,\vec{m}_3,\vec{m}_4)$ for later use.

**Figure 4.8:** The average 3D face
$\begin{figure}\center \epsfig{file=norm/figs/average.ps,height=5cm} \end{figure}$