P3P - Fitting the 3D Model to a 2D Image

Next: Selecting the Optimal P3P Up: Inverse 3D Projection Previous: Inverse 3D Projection

P3P - Fitting the 3D Model to a 2D Image

If the deformation variable, s_y is held constant, only 6 degrees of freedom remain in Equation

and an exact solution which involves 6 distances can be found to the problem of fitting to three 2D points $(\vec{i}_1,\vec{i}_2,\vec{i}_3)$ . At this stage, we choose to use $(\vec{i}_1,\vec{i}_2,\vec{i}_3)$ which correspond to the eyes and the nose to perform the fitting since we consider their loci to be more accurate and more stable than the mouth's locus. The fitting then reduces to the well-known Perspective-3-Points (P3P) problem [13]. A direct solution is possible due to the 1-to-1 nature of the problem. There exist more complex solutions capable of finding optimal fits for over-constrained multi-point problems [26], [18] and [16]; however, we shall defer the fitting to the 4th point (the mouth) for now.

It is important at this stage to consider perspective. In 3D rendering, perspective generally enhances the realism of a scene. Buildings and geometric objects appear to taper off or shrink as they move away from the user in distance. If, however, the size of the object in depth is relatively small, the perspective effect is negligible. The effect of perspective on face images is subtle to the human eye and it is thus possible to render 3D face data without such perspective computations. We shall utilize this simplification and approach the solution of fitting to three points using the Weak-Perspective-3-Points (WP3P) technique [2]. A brief summary of the computation of the solution to WP3P is presented here, but for an in-depth analysis of the derivation the reader should consult [2].

Observe Figure which depicts the desired, scaled orthographic projection of the model upon the image plane. The intra-point distances in the figure are (R₀₁, R₀₂, R₁₂) for the model and (d₀₁,d₀₂,d₁₂) for the image object. The overall scaling the model needs to undergo to fit the image is defined as s. The vertical heights from the image plane of the aligned model's two vertices are (H₁,H₂) before scaling or (h₁,h₂) after scaling. The parameters in Figure are computed using Equations , and :

**Figure 4.10:** The scaled orthographic projection of the model upon the image plane
$\begin{figure}\center \epsfig{file=norm/figs/alter2.ps,height=8cm, angle=-90} \end{figure}$

$\begin{displaymath}s = \sqrt{\frac{b+\sqrt{b^2-ax}}{a}} \end{displaymath}$

(4.3)

$\begin{displaymath}(h_1,h_2) = \pm (\sqrt{(sR_{01})^2-d_{01}^2} ,\sigma\sqrt{(sR_{01})^2-d_{01}^2}) \end{displaymath}$

(4.4)

$\begin{displaymath}(H_1,H_2)= \frac{1}{s}(h_1,h_2) \end{displaymath}$

(4.5)

The intermediate variables are defined in Equations , , and and :

a = (R₀₁+R₀₂+R₁₂) (-R₀₁+R₀₂+R₁₂) (R₀₁-R₀₂+R₁₂) (R₀₁+R₀₂-R₁₂)

(4.6)

b = d₀₁²(-R₀₁²+R₀₂²+R₁₂²) + d₀₂²(R₀₁²-R₀₂²+R₁₂²) + d₁₂²(R₀₁²+R₀₂²-R₁₂²)

(4.7)

c = (d₀₁+d₀₂+d₁₂) (-d₀₁+d₀₂+d₁₂) (d₀₁-d₀₂+d₁₂) (d₀₁+d₀₂-d₁₂)

(4.8)

$\displaystyle \sigma = \left\{ \begin{array}{cc} 1 & if \: d_{01}^2+d{02}^2-d{12}^2 \leq s^2(R_{01}^2+R{02}^2-R{12}^2) \\ -1 & otherwise \end{array}\right\}$

(4.9)

We then solve for the rotation matrix using the intermediate matrices A and B using Equation and Equation :

$\displaystyle \begin{array}{cc} A = & \left\{ \begin{array}{ccc} (\vec{m}_1-\ve... ...c{m}_1-\vec{m}_0) \times (\vec{m}_2-\vec{m}_0)) \end{array}\right\} \end{array}$

(4.10)

$\displaystyle \begin{array}{cc} B = & \left\{ \begin{array}{ccc} x_{01} & x_{02... ...1 & h_2 & \frac{x_{01}*y_{02}-x_{02}*y_{01}}{s} \end{array}\right\} \end{array}$

(4.11)

where x₀₁, x₀₂, y₀₁, y₀₂ are 2D coordinates relative to a coordinate system centered at the position of the left eye, $\vec{i}_0$ :

x₀₁=i_{1_x} - i_{0_x}			(4.12)
x₀₂=i_{2_x} - i_{0_x}			(4.13)
y₀₁=i_{1_y} - i_{0_y}			(4.14)
y₀₂=i_{2_y} - i_{0_y}			(4.15)

The rotation matrix, R, can then be computed using Equation :

R= BA^-1

(4.16)

The translation vector, t, is computed simply by translating the centered coordinate system to the position of $\vec{i}_0$ . The translation in the depth dimension is irrelevant and can be omitted since scaling is directly controlled by s (orthographic projection is not scaled by depth):

$\begin{displaymath}t=\{ i_{0_x} \: i_{0_y} \: 0 \}^T \end{displaymath}$

(4.17)

Once the values of R and t have been determined, any 3D model point can be transformed and the points $(\vec{m}_0, \vec{m}_1, \vec{m}_2)$ will align with the image points $(\vec{i}_0, \vec{i}_1, \vec{i}_2)$ . The transformation from model point to image point is thus:

$\begin{displaymath}\{ i_x \: i_y \: i_z \}^T = R \{m_x \: m_y \: m_z \}^T + t \end{displaymath}$

(4.18)

Of course, the i_z value is only a relative measurement of the depth of the model point. It is useful, however, for keeping track of the relative depth of the model point with respect to other model points.

Note that this is a direct solution of the WP3P problem save for one ambiguity: the $\pm$ value in the computation of (h₁,h₂) in Equation . This ambiguity allows two possible alignments of the model to the image points. The 3D face can either line up by facing towards or away from the viewer. Of course, we know that the face is projecting onto the image from behind the image plane and is facing the viewer (or ``camera-man''). Thus we select either + or - in Equation to assure that the 3D model is actually behind the image plane and is facing towards the camera. This ambiguity is resolved by computing the normal of the nose. In other words, a vector protruding from the nose on the model is introduced and undergoes the transformation in Equation . We begin by calculating Equation with a '+'. We note the relative depth value of the vector i_z. If the vector is pointing away from the viewer (its tip is farther from the image plane than its base or, equivalently, has a larger i_z value) then the model is pointing away from the camera and we repeat the computation with a '-' in Equation .

The end result is a mapping from the 3D model to the image which lines up the eyes and the nose optimally.

Next: Selecting the Optimal P3P Up: Inverse 3D Projection Previous: Inverse 3D Projection

Tony Jebara
2000-06-23