Video analytics — the automated analysis of terabytes of video content — has a proven track record helping investigators to glean information from surveillance cameras, recognize faces in a crowd or zoom in on the license plates of suspects. However, researchers know they need more advanced capabilities and software algorithms to go beyond detection and tracking and really understand the relationships between objects in video footage.
Perimeter-detection analytics is a mature art, says Maj. David Mulholland, commander of technical services at the United States Park Police. “But find me the person with the red shirt and baseball hat, or the person in the crowd walking west to east when everyone else is walking east to west,” he says. “Those are the type of analytics that require more robust capability” — basically, the capability to extract information, generate data sets and identify patterns.
A major challenge, says Jie Yang, program director at the National Science Foundation, is that software development has not kept pace with advances in video camera technology. In addition, there is now much more digital video data in nonstandard formats — petabytes of it – coming from cell phones, digital cameras, tablets, surveillance systems and unmanned aerial vehicles.
So-called computer vision research, which focuses on image processing, combined with computer graphics and natural language expertise, will move video analysis beyond object detection to help analysts determine the relationships between the objects, Yang says.
Visual Cortex on Silicon, an NSF-backed research project led by Penn State University, seeks to design a machine-vision system that operates much like human vision, allowing computers to record and understand visual content much faster and more efficiently than current technologies.
Finding the Needle in a Haystack
But what of all the existing video out there? One of today’s biggest Big Data challenges may be finding hits in the mountain of digital video produced by everyone from tourists with iPhones to surveillance drones. Government researchers are working to make usable information out of pixels.
The Intelligence Advanced Research Projects Activity (IARPA) is involved in a five-year research project called Aladdin Video. “The goal is to develop advanced software tools to let analysts search and work with video Big Data much like they do with text,” says Jill Crisman, a program manager within IARPA’s Office of Incisive Analysis. Aladdin uses a mix of audio and video extraction, knowledge representation and search technologies to process video.
The process starts with an automatic tagger that looks at the video in the analyst’s queue and creates a sophisticated search index for that information library.
“It is sort of like a giant card catalog of information about the videos,” Crisman says. The analysts query the catalog using a small number of existing video clips in which an event they are looking for occurs, plus some words that describe the event. Aladdin will process the query, search the catalog and provide a list of clips from the queue sorted by relevance.
A goal of Aladdin is to apply metatags to online video in order to describe its content and assist in multimedia event detection. “A text document for video,” Crisman calls it.
Another IARPA program, called Finder, is developing tools to help analysts locate where videos were taken based on content instead of GPS tags. Researchers have created a model of the world so they can match what they see video content to global locations. The world model comprises 51 billion smaller models, corresponding to locations around the globe.
Video as Data
Ultimately, researchers want to be able to manipulate video the same way they manipulate other data. “We’re looking to exploit pixels and turn them into data that can be combined with [other sources],” said Ken Rice, chief of the ISR integration division at the National Geospatial-Intelligence Agency, at a recent National Institute of Standards and Technology symposium.
One major hurdle is the variety of algorithms currently used to process video. Rice says there could be as many as 20 algorithms to process video coming from Predator unmanned aerial vehicles.
“What we really need is an open framework for video processing,” which should include a way of characterizing what has been done to the pixels during processing, Rice says. That way, experts can factor into their analysis any uncertainty introduced by changes to the video.
The eyes may deceive, but advanced research into video analytics is closing the gap between what analysts see and what they know.