The Boston Marathon bombings revealed the limitations of facial-recognition technology to the general public. Many private citizens, accustomed to seeing computers on television and in the movies match photographs to motor vehicle and other databases in mere seconds, were surprised that our nation’s premier law enforcement agencies did not have the same level of technological sophistication available to them when Boston’s, and perhaps the country’s, security had been threatened.
Since 9/11, the federal government has spent a great deal of money on facial-recognition technology, with grants in the millions of dollars going to state and local governments for database creation. Even though government databases contained pictures of both of the Boston suspects, technology could not match surveillance footage to database images.
Before addressing the limitations of today’s technology, let’s discuss how one type of facial-recognition technology works.
Face detection occurs first. The algorithms typically cycle through various boxes, looking for faces with a certain dimension. Inside those boxes, the system detects facial landmarks and assigns a score, providing a confidence level regarding whether the image is a face. Once confirmed as a face, the technology creates a template, generally based on factors such as the relative distance between the eyes, the spot just under nose and above the lip, and ear to ear.
The mathematical representation developed is then compared to other detected faces. The similarity in ratios between distances on various points of the face, typically focused around anchors, such as the nose, the eyes, the ears and the mouth, yields a score on a logarithmic scale. Close matches range from 3 to 5, and definite nonmatches are less than 1. When the same image serves as both probe and target, a score of 40+ is possible.
Several factors limit the effectiveness of facial-recognition technology:
1. Image quality
Image quality affects how well facial-recognition algorithms work. The image quality of scanning video is quite low compared with that of a digital camera. Even high-definition video is, at best, 1080p (progressive scan); usually, it is 720p. These values are equivalent to about 2MP and 0.9MP, respectively, while an inexpensive digital camera attains 15MP. The difference is quite noticeable.
2. Image size
When a face-detection algorithm finds a face in an image or in a still from a video capture, the relative size of that face compared with the enrolled image size affects how well the face will be recognized. An already small image size, coupled with a target distant from the camera, means that the detected face is only 100 to 200 pixels on a side. Further, having to scan an image for varying face sizes is a processor-intensive activity. Most algorithms allow specification of a face-size range to help eliminate false positives on detection and speed up image processing.
3. Face angle
The relative angle of the target’s face influences the recognition score profoundly. When a face is enrolled in the recognition software, usually multiple angles are used (profile, frontal and 45-degree are common). Anything less than a frontal view affects the algorithm’s capability to generate a template for the face. The more direct the image (both enrolled and probe image) and the higher its resolution, the higher the score of any resulting matches.
4. Processing and storage
Even though high-definition video is quite low in resolution when compared with digital camera images, it still occupies significant amounts of disk space. Processing every frame of video is an enormous undertaking, so usually only a fraction (10 percent to 25 percent) is actually run through a recognition system. To minimize total processing time, agencies can use clusters of computers. However, adding computers involves considerable data transfer over a network, which can be bound by input-output restrictions, further limiting processing speed.
Ironically, humans are vastly superior to technology when it comes to facial recognition. But humans can only look for a few individuals at a time when watching a source video. A computer can compare many individuals against a database of thousands.
As technology improves, higher-definition cameras will become available. Computer networks will be able to move more data, and processors will work faster. Facial-recognition algorithms will be better able to pick out faces from an image and recognize them in a database of enrolled individuals. The simple mechanisms that defeat today’s algorithms, such as obscuring parts of the face with sunglasses and masks or changing one’s hairstyle, will be easily overcome.
An immediate way to overcome many of these limitations is to change how images are captured. Using checkpoints, for example, requires subjects to line up and funnel through a single point. Cameras can then focus on each person closely, yielding far more useful frontal, higher-resolution probe images. However, wide-scale implementation increases the number of cameras required.
Evolving biometrics applications are promising. They include not only facial recognition but also gestures, expressions, gait and vascular patterns, as well as iris, retina, palm print, ear print, voice recognition and scent signatures. A combination of modalities is superior because it improves a system’s capacity to produce results with a higher degree of confidence. Associated efforts focus on improving capabilities to collect information from a distance where the target is passive and often unknowing.
Clearly, privacy concerns surround this technology and its use. Finding a balance between national security and individuals’ privacy rights will be the subject of increasing discussion, especially as technology progresses.