Researcher Anthony Dixon trains a camera to recognize and track different types of motion
It's dusk in a vast parking lot. One of several cameras on the roof of a shopping mall zooms in on a man who is walking from car to car. Is he planning on stealing a vehicle? In a tunnel, a camera springs to life as it notes a momentary flash of light and slowing traffic. Has there been an accident? At an airport checkin counter, a camera zeros in on a passenger's face and notifies personnel that there is a 90 % probability that the passenger is wanted by the police. Should he be stopped?
Welcome to the world of cameras with built-in brains. A far cry from their conventional closed circuit counterparts, these devicessome entering trial service, most still in various phases of developmentare equipped with microchips and programs that allow them to search for specific classes of events, such as stopped cars in tunnels, and notify personnel only if something suspicious or unusual occurs. Smart camera applications range from watching windows and doors for intruders to recognizing events such as a passenger falling onto a track or a robbery in progress. Specialized systems have been developed to recognize faces, license plates and anything with a bar code or other form of identification. One system tracks and even predicts the location of balls during sporting events (see insert). Others may soon be able to pick out minute defects as products race by on production lines.
"What all these systems have in common is the potential to filter out routine events and zero in on what's important," says Dr. Visvanathan Ramesh, who heads the Real-Time Vision and Modeling Department at Siemens Corporate Research (SCR) in Princeton, New Jersey. And with the number of cameras growing by leaps and boundsmany subway systems, airports and military bases already have thousandsthe ability to allow a limited number of human observers to concentrate on significant events will become increasingly important.
Vision systems using soft-ware developed at Siemens Corporate Research in New Jersey can detect slow or stopped cars in tunnels (left). Software from Siemens' Roke Manor Research center in the UK detects anomalies such as a car involved in an illegal U-turn (right)
Cameras that Tell Stories. Recognizing that users are being confronted with a growing torrent of visual information, researchers are developing strategies for reducing the flow of images while nevertheless providing an overview of what cameras are seeing. In the process, they are also developing tools for rapidly searching archived video data. At Siemens' vast Perlach research campus on the southern outskirts of Munich, researchers Jörg Heuer and Dr. Andreas Hutter are testing a prototype program that allows smart cameras to generate "descriptors" of the content of each image. In a demonstration of a video sequence showing a street as seen by a roof-mounted camera, the camera begins generating data whenever an object enters its field of vision. To do so, it uses sophisticated video processing (a major focus of work at SCR) to segment the object, track it, and characterize its motion. But rather than classifying the object as, say, a man or a truck, it describes it in terms of its visual components. "We don't want the camera to decide what an object is. That's up to the user. Instead, it should assemble as much raw information as possible in order to allow a human operator to access the data in a dynamic way," says Hutter. The results may sound cryptic, but are full of information: "Triangle with rectangle (read person with a briefcase) entered at coordinates xy 10:57.28, height x, top blue checked, lower section gray, surface (read hair) brown, left field at coordinates xz, 10:57:41."
Known as "meta data," this information can be transmitted with, or apart from, the camera's video stream. As long as nothing unusual is observed, the camera will not flag the attention of security personnel. But the images and the meta data are linked and can be stored in a common database. Depending on the application, security personnel may have access to both or only the meta data. Later, if an event is reportedsay a theftand an eyewitness describes a suspect, the database can be interrogated. "Let's say we're looking for a woman with a briefcase, brown hair and a checked jacket," says Heuer. "Using the image database software, we would draw a triangle with a small square to indicate the geometry of the target object, and pick out descriptors to search for. We would narrow the search by requesting the system to look at the databases of the cameras covering the exits of the building in which the theft occurred. The system would then display a group of images that fit the descriptors, each of which would be associated with a video clip." He explains that, although the cameras would not be tracking or "handing-off" people or vehicles from one camera to the next, they would all be using the same descriptors. Thus, once the investigation had been narrowed to one or two people, a search of the facility's entire database would indicate all the locations that the suspects had visited. "The system is capable of finding out the trajectory of any person on any particular day as long as the user has some idea of what that person looked like or was wearing. We believe this is a unique technology in terms of its ability to produce comparisons," says Hutter. And that's not all. The technology, which, depending on the resolution of cameras, could be adapted to recognize license plates and even faces, could be used in conjunction with mobile units such as cell phones and PDAs. For instance, a security man in a department store might receive a message from a camera such as "unauthorized entry in delivery zone C." By pressing an icon, he would see an image of a vehicle or a person, and thus know exactly what to look for before reaching the scene.
Since Hutter and Heuer's descriptor-based system uses the MPEG-7 standard, the researchers are bullish about its commercial prospects. "Our open approach means that other companies will be able to develop leading edge technology components for our system. We expect our system to benefit directly from this," says Hutter.
Better than Burglar AlarmsWhile Hutter and Heuer's prototype is a complex system designed for industrial and commercial users, other researchers at Siemens have their sights set on much simpler vision-based technologies that could hit the consumer market in the immediate future. Setrix, Inc., a Munich, Germany-based spin-off of Siemens Corporate Research, sees a huge potential market for inexpensive vision sensors in private homes. "A GSM-equipped smart mini-cam equipped with a person detection program can watch a window or a door 24 hours a day. If someone enters a room, it can send an SMS message to the homeowner, who can then decide whether to look at an image on his cell phone or PDA. The user also has the option of scrolling through a series of images that could cover minutes, hours or days," says Dr. Uwe Albrecht, investment partner for Siemens Venture Capital, which funds Setrix.
Unlike motion detectors, which often cause false alarms, vision sensors can tell the difference between a person and an animal. Albrecht points out that a number of trends suggest that smart cameras will soon be showing up in private homes. "The hardware is getting smaller and cheaper, the processing power is growing, the communications technology is off-the-shelf, and the output is far more informative than anything you get from a typical motion-based burglar alarm system," he says. The camera, which could soon be offered by major phone companies, would come equipped with software for recognizing simple events, such as a person entering its field of view, or a window being opened. More advanced models would be capable of receiving software updates, and even trading information.
Intelligent cameras can tell the difference between normal and unusual behavior. A person going from car to car, for example, would cause such a camera to transmit images to security personnel
Cameras that See Shopping Patterns. To get an idea of the potential behind this technology, just talk to SCR's Ramesh. His team has developed a system that can separate heads from background information, which could allow future wireless videophones or surveillance cameras to sharply reduce transmission requirements. But there's much more to it than that. "The current system can also track multiple people as they enter a room, focus on their heads or faces and log information to generate statistics," says Ramesh. It accomplishes this by using algorithms that detect people from visual information generated by omnidirectional, 360°-field-of-view sensors. It also uses auxiliary pan-tilt cameras to focus and zoom in on faces. "We see this as a step toward a new generation of intelligent sensors that perform autonomous vision tasks and report data such as shopping patterns in department stores or usage patterns in subway stations to a remote base station," explains Ramesh, who emphasizes that the technology is also applicable to automation and machine vision/inspection.
Of course, when it comes to interpreting the contents of real-time images, life is a lot easier indoors. Outside it's a different story. Nevertheless, researchers at SCR and Roke Manor Research (RMR), a UK-based business owned by Siemens, have developed programs that accurately monitor city street and highway situations under a full range of weather conditions. In both cases, researchers are concentrating on developing programs that help cameras detect anomalies such as stopped, slow or wrong-way vehicles. Indeed, such a solution is already operating successfully in a tunnel in Switzerland. But the program runs on PCs equipped with special-purpose hardware from Siemens Building Technologiesnot inside the cameras. "However," says Ramesh, "we believe that as cameras become cheaper, we will be able to implement high-performance algorithms inside cameras." Furthermore, SCR researchers are working with a number of universities to fine tune these systems. For instance, scientists at Columbia University in Manhattan are studying how different substances, such as grass and concrete, look when wet or hot. "We are building statistical models that are consistent with physics in order to improve the accuracy of automated image interpretation systems," explains Ramesh.
Imagine watching your favorite sport from the point of view of the player of your choiceor from the umpire's location. It's possible with Hawk-Eye, a new technology that processes the video images from dedicated cameras around a field to produce three-dimensional tracks of the ball and players with 5 mm accuracy. What's more, Hawk-Eye can even predict the future flight path of the ball. Even as cameras are zooming or panning to follow the action, Hawk-Eye takes 3-D measurements in real-time using field markings. Developed by Roke Manor Research, a UK-based business owned by Siemens, Hawk-Eye is already being used by the BBC, Sky Sports, and Britain's Channel 4. Since the system essentially digitizes the entire game, fans can use Hawk-Eye to recreate scenes from different points of view over the Internet. Such representations can be displayed on a computer, television, orin the near futureon a UMTS phone. Working in partnership with Sunset+Vine, a UK company, Roke Manor Research established Hawk-Eye Innovations Ltd., a company that specializes in applying the new technology to different sports, including cricket, snooker, soccer, tennis and billiards.
For further information, visit:
Recognizing Anomalies. Meanwhile, at RMR, researchers have developed a technology called Video Motion Anomaly Detection (VMAD), which can actually learn what is normal and what is not in terms of the motions in a given scene. "The system alerts an operator or triggers an event recording when an unusual activity occurs," says Anthony Dixon, RMR's manager of security applications. VMAD is so flexible that it will detect anomalies ranging from animals on a road to intruders climbing a fence. "The system uses a patented feature extractor that was originally developed for 3D vision and robotic applications," explains Dixon. "The learning algorithms reduce or eliminate the need for special programming for individual applications." Recognizing that something anomalous has happened is, of course, miles away from identifying what has happened. But that's not the point. Developers of smart cameras agree that humans will continue to hold the intellectual high ground for years to come. On the other hand, cameras excel in many areas compared to humans. For one thing, they have an unlimited attention span; for another, they can detect things that we can't: a defect in a hearing aid on a production line, a license plate that doesn't match in a long list of authorized users. And even in an area where humans are extraordinarily adeptface recognitionsmart cameras, benefiting from biometric information and recent developments in 3D scanning, are likely to surpass human capabilities soon.
So ten or twenty years from now, the camera on the shopping mall roof probably still won't have a clue as to why a man is walking from car to car in the parking lot, but it will most certainly be smart enough to ask a camera perched on the nearest lamp post to take a look, identify the potential culprit and check whether the license plate of the car he gets into is his.
Arthur F. Pease