It is important at this stage to consider perspective. In 3D rendering, perspective generally enhances the realism of a scene. Buildings and geometric objects appear to taper off or shrink as they move away from the user in distance. If, however, the size of the object in depth is relatively small, the perspective effect is negligible. The effect of perspective on face images is subtle to the human eye and it is thus possible to render 3D face data without such perspective computations. We shall utilize this simplification and approach the solution of fitting to three points using the Weak-Perspective-3-Points (WP3P) technique [2]. A brief summary of the computation of the solution to WP3P is presented here, but for an in-depth analysis of the derivation the reader should consult [2].
Observe Figure which depicts the desired, scaled orthographic
projection of the model upon the image plane. The intra-point distances in the
figure are
(R01, R02, R12) for the model and
(d01,d02,d12) for the image object. The overall scaling the model
needs to undergo to fit the image is defined as s. The vertical heights from
the image plane of the aligned model's two vertices are (H1,H2) before
scaling or (h1,h2) after scaling. The parameters in
Figure
are computed using Equations
,
and
:
The intermediate variables are defined in
Equations ,
, and
and
:
We then solve for the rotation matrix using the intermediate matrices
A and B using Equation and Equation
:
where x01, x02, y01, y02 are 2D coordinates
relative to a coordinate system centered at the position of the left eye,
:
x01=i1x - i0x | (4.12) | ||
x02=i2x - i0x | (4.13) | ||
y01=i1y - i0y | (4.14) | ||
y02=i2y - i0y | (4.15) |
The rotation matrix, R, can then be computed using Equation :
The translation vector, t, is computed simply by translating the centered
coordinate system to the position of .
The translation in the depth
dimension is irrelevant and can be omitted since scaling is directly
controlled by s (orthographic projection is not scaled by depth):
![]() |
(4.17) |
Once the values of R and t have been determined, any 3D model point can be
transformed and the points
will align with
the image points
.
The transformation from
model point to image point is thus:
Of course, the iz value is only a relative measurement of the depth of the model point. It is useful, however, for keeping track of the relative depth of the model point with respect to other model points.
Note that this is a direct solution of the WP3P problem save for one
ambiguity: the
value in the computation of (h1,h2) in
Equation
. This ambiguity allows two possible alignments of
the model to the image points. The 3D face can either line up by facing
towards or away from the viewer. Of course, we know that the face is
projecting onto the image from behind the image plane and is facing the viewer
(or ``camera-man''). Thus we select either + or - in
Equation
to assure that the 3D model is actually behind
the image plane and is facing towards the camera. This ambiguity is resolved
by computing the normal of the nose. In other words, a vector protruding from
the nose on the model is introduced and undergoes the transformation in
Equation
. We begin by calculating
Equation
with a '+'. We note the relative depth value of
the vector iz. If the vector is pointing away from the viewer (its tip is
farther from the image plane than its base or, equivalently, has a larger
iz value) then the model is pointing away from the camera and we repeat the
computation with a '-' in Equation
.
The end result is a mapping from the 3D model to the image which lines up the eyes and the nose optimally.