Biologically motivated, low-level attentional mechanisms are applied at multiple scales to identify possible face regions. The facial contour is then estimated by computing symmetric enclosure and is used to guide the search for feature points within the face. Symmetric blob detection, limb extraction and signature analysis are used to locate the eyes, mouth and nose of each individual. A database of 3D range data of human heads allows us to align a 3D model to the coordinates of the detected feature points in the input image. The intensity image's textural representation of the face is mapped onto the 3D range data, thereby segmenting the face from the image. The 3D model is then rotated into a frontal view to synthesize a frontal ``mug-shot'' of the individual. Lighting and shading variations are corrected by histogram fitting. Once fully normalized, the image is projected into a low-dimensional subspace via Karhunen-Loeve Decomposition to compress the data and to verify detection. The resulting low-dimensional vector description is matched against a database using simple distance measures to determine the face's identity as one of the previously identified training examples. Due to the computational efficiency of the hierarchical detection scheme and the initial step of applying simple attentional mechanisms, tracking faces from a video source could be achieved.