Machine Vision – Trends
Machines See the Light
Vision systems are proliferating—in surveillance, in the automotive sector, in industrial environments, healthcare, and the military. Behind this trend is a revolution in the accuracy of image processing, the ability to fuse key data from large numbers of sensors in such a way as to automate processes or provide decision support, and the ability of systems to intelligently interpret and report what they see.
Intelligent imaging applications developed by Visvanathan Ramesh (bottom left) and other Siemens experts range from detecting faces (top left), to real time lane, vehicle and sign recognition (top right and bottom right), automated subway platform surveillance (center left) and detection of hairline cracks in turbine blades (center right)
Machines are about to see the light. It could be the laser light that is dawning in high-tech ports, allowing automated cranes to stack 80-ton containers as precisely as cans of corn on a supermarket shelf ( 3D Object Recognition). It could be the invisible energy of radar and radio waves that will help vehicles keep a safe distance from one another, or the visible light that allows cameras with 3D vision to manage automated postal sorting systems (see (Scanning Technologies) or visualize the inner ear to produce custom-made hearing aids.
In these cases, and many more, what becomes clear is that machine vision is evolving to a level that will soon border on intelligence. "In the very near future, the boundary between sensors and intelligent reasoning will become gray," predicts Visvanathan Ramesh, PhD, head of the Real-time Vision and Modeling Department at Siemens Corporate Research (SCR) in Princeton, New Jersey.
In no other area of machine vision is this trend clearer than in surveillance technology (Video Surveillance). Here, cameras themselves are already spotting abandoned objects in airline terminals, people who are dangerously close to tracks in subway stations, and cars traveling in the wrong direction in tunnels and on roads. And where "smart" surveillance systems were once plagued by false alarms, today they offer a level of event detection reliability that in many cases exceeds 95 % while maintaining very low false alarm rates. "Three or four years ago there were more false alarms than real detections of events," says Imad Zoghlami, PhD, a specialist in surveillance technology at SCR. "Today, our traffic surveillance systems in places such as Hong Kong’s Aberdeen Tunnel and Switzerland’s Giswil Tunnel produce less than one false alarm per week."
While earlier systems were confused by reflections, occlusions and sharp contrasts, newer systems—thanks to smarter algorithms—can continuously track objects without triggering an alarm. The result: security personnel pay attention to alarms and can concentrate on deciding whether events merit action. That’s important because, while cameras are proliferating—the London Underground alone has 6,000—humans are not getting any better at watching activity on monitors, a point confirmed by a recent independent study that found that surveillance personnel "miss 95 % of scene activity after only 22 minutes of observation," says Zoghlami.
Cameras with Brains. But aside from smart, new algorithms, overcoming challenges of this sort generally requires more processing power—something that is in short supply today because four to eight cameras typically share a single CPU (central processing unit). That’s about to change, however, as the price of so-called "embedded" intelligence continues to decline. In fact, cameras equipped with a special-purpose digital signal processor are already available.
Says Emma Brassington, head of Vision Processing at Roke Manor Research Limited (Roke), a Siemens subsidiary in Romsey, England, "There is growing demand to cope with the huge amount of data cameras are producing. The answer is not to throw more and more people at watching their output, but to build more and more intelligence into them in the first place."
Embedded intelligence offers many advantages. For instance, cameras equipped with their own processors will be able to independently monitor what is happening within their field of view, filtering out unimportant data, and transmitting information from images to a central location for evaluation. Furthermore, smart cameras offer the potential of being wirelessly networked without being confused by multiple events, congestion, interference or noise—a problem Justinian Rosca, PhD, a specialist in signal processing at SCR, is attempting to resolve. "The more embedded processing each camera can perform, the less information it needs to transmit—and the more room there is for sharing crucial data," he says.
Rosca envisions sharing on a grand scale, with tens of thousands of tiny, wireless cameras constantly searching for signs of danger and comparing "notes" known as meta data regarding the characteristics of anything that changes from frame to frame. (see Pictures of the Future, Spring 2003, Smart Cameras). "What this boils down to is cameras generating written reports," says Rosca. "This will make it possible to track thousands of events simultaneously by describing each person’s characteristics and passing that data back and forth among cameras."
Considering the future need for distributed smart camera networks, Siemens Corporate Technology has formed a global team in video analysis that includes vision experts from SCR, a multimedia communication team headed by Dr. Andreas Hutter in Munich, and teams in India headed by Rita Chattopadhyay and Dr. Zubin Varghese. "By connecting the dots between Princeton, Munich and Bangalore, SCR has come up with a holistic approach to vision systems development," says SCR President Paul Camuti.
Seeing with Self Awareness. Before vision systems can become the basis for real world decisions they must be capable of evaluating image information with a view to identifying the presence of uncertainty and disturbance factors such as occlusions, wetness, reflections, and shadows. Even more important, they must be able to quantify their own ability to accurately perform their visual tasks.
With this in mind, SCR has been developing statistical model and analysis tools that will allow systems to diagnose themselves. "The need for this becomes apparent when you consider the increasing complexity of these systems and the environments and conditions in which they are expected to perform," explains Ramesh. "To characterize system performance, we have developed a holistic view that models each component’s—as well as the entire system’s—performance limits in real time. It boils down to giving a machine the equivalent of what we call self-awareness with respect to how well it is performing its tasks."
Working within this context, Dorin Comaniciu, PhD ( Innovators–Comaniciu), who heads SCR’s Integrated Data Systems Department, has developed and patented a mathematical invention called Robust Information Fusion, which is essentially a novel way of detecting and weeding out questionable information from any given sensor source by analyzing data from multiple sources. The result is a kind of data democracy in which bits and bytes from a spectrum of sources, such as the sensors in a car, can merge seamlessly into a single information expressway.
The technology is based on the principle that each measurement that a sensor produces comes with a level of uncertainty. "In a nutshell," explains Comaniciu, "Robust Information Fusion is a statistical method that weighs the combination of data from different sources to obtain an optimum result."
Another major area of research that focuses on enhancing the robustness of machine vision is statistical learning. In order for cameras to track objects easily and quickly they need to know how to map image and video data to the object or its event category—what humans might call concept formation. To recognize people or cars as concepts, for instance, a camera must know what their common properties are. "To do that, you have to develop a comprehensive statistical model that explains most of the variations observed in the data," explains Ying Zhu, PhD, a specialist in machine learning. "Once you have such a model, you can apply it to analyze camera data."
Vehicles with Vision. Modeling the real world is not as easy as it sounds, however. Applied to traffic sign recognition, for instance, the technology can successfully recognize signs 90 to 95 % of the time. "But bad weather and poor lighting can still confuse the camera system," says Zhu. "Fusion with other sensor modalities is the solution to this problem."
Detecting signs is just the beginning. According to Frost & Sullivan, cameras are expected to be the fastest growing sensing technology in the automotive sector ( Facts and Forecasts). As part of Siemens’ pro.pilot driver assistance system (see Pictures of the Future, Fall 2005, ( Driver Assistance) the company is developing camera-based lane-keeping and lane departure warning systems, driver and occupant monitoring for drowsiness and airbag deployment, pedestrian detection, and front- and rear-end monitoring, among others. Additional sensing will be provided by radar, infrared, ultrasound and wireless systems to detect and communicate with nearby vehicles and avoid accidents associated with blind spots, sudden braking and low visibility.
Thanks to new algorithms and embedded processing, intelligent security cameras (bottom) can independently and reliably recognize the difference between furniture (green box, left), abandoned bags (red box) and people, as well as changes in available parking spaces (center)—thus concentrating on meaningful events rather than occlusions, reflections, or shadows
"Automotive sensing technologies are really starting to take off," says SCR’s Camuti. "Computing is pushing it, and demand for safety is pulling it. Some of these technologies are already on the market. Eventually, they will be offered as packages, and further on they will set the stage for autonomous driving."
Although it may be twenty years or more before our cars can take responsibility for getting us from here to there, autonomous military vehicles may be just around the corner. Indeed, at Siemens’ Roke facility, engineers have already developed an autonomous vehicle small enough to navigate buildings. On display for Roke’s recent 50th anniversary celebration, the vehicle uses a camera sensor to move through areas independently (even without GPS), while avoiding obstacles and wirelessly transmitting video to off-theater personnel. The same technology is also driving development of surveillance capabilities in unmanned air vehicles, police vehicles, and crop and power line inspection systems.
Behind Roke’s (and SCR’s) autonomous vision-based systems are complex technologies such as model-based vision—the ability to accurately track a known 3D model to determine its orientation and position—and 3D structure from motion, a technology that can be compared to the way humans build up concepts as new things are seen and experienced.
"With this technology, we’ve found that machines can build a model of the world as they move along," explains Brassington. "Our philosophy is that, instead of teaching a machine about the world, we let it explore and learn from what it sees. Ten or fifteen years from now, this work will lead to machines that will be able to make decisions, adapt to new situations, and change their goals autonomously."
Robots at Work. Long before machines are capable of building their own models of complex natural or urban environments, they will be able to do so in simpler, more predictable places, such as factories and warehouses. "Our goal is to allow vision-based machines to create models by simply looking at things," says Yakup Genc, PhD, who heads SCR’s 3D Vision and Augmented Reality Program.
Whether a robot is attempting to identify a new object or a familiar one, 3D vision offers a major advantage over 2D. Pioneered by Dr. Claudio Laloni and his team at Siemens Corporate Technology in Munich, 3D technology is based on the projection of bar code-like lines (structured light) onto surfaces to determine their characteristics with a resolution of 100 µm ( 3D Object Recognition).
In the industrial area, 3D vision is now being used to inspect power plant turbine blades. "The result," says Genc, "is a complete, digital, high resolution model of each blade. The customer can use this to detect and track defects such as coating loss, abrasions or twisting."
The information, he adds, can be applied to building a database for each blade and analyzing the effects of varying conditions on the blades. Not only is the system fast (five minutes per blade in a portable scanner) and more accurate than any human eye, but it provides a standardized way of collecting data.
The Big Picture. It’s a long way from turbine blades to human hearts, but the two are linked by a common philosophy pursued by Ramesh, Comaniciu and their teams, namely, a systems view of machine vision. In this context, every module in a system should possess awareness of how certain its interpretation is, while uncertainties should be fused in a coherent framework to further support formulation of accurate interpretations. These principles have already found their way into video analysis systems for security and medical imaging.
The design of such systems and their modules is increasingly being driven by so-called "database-guided techniques," which use annotated examples of image or video data to produce automated algorithms that perform visual and quantitative measurement tasks.
An example of such an algorithm is Auto EF, an invention patented by SCR’s Dorin Comaniciu and his team that is now entering the clinical market. The algorithm uses a conventional ultrasound image of a patient’s heart to calculate the heart’s ejection fraction (EF)—the difference in the amount of blood pumped between diastole and systole.
"Today," says Comaniciu, "this crucial measurement of cardiac health is ether eyeballed or traced manually. It takes an expert a couple of minutes to do it. It takes the software two seconds to do the same thing."
Developing the database that allows a system such as Auto EF to recognize the perimeter of a beating heart in real time from fuzzy ultrasound images is a significant challenge. With a view to developing software tools designed to simplify annotation in the field of database-guided medical diagnostics, Comaniciu’s team, in collaboration with Siemens Corporate Technology, plans to establish a center of competence in Bangalore, India.
As specialized algorithms that automatically measure complex functions such as ejection fraction are developed, they will help to accelerate workflows in image-based clinical environments. Furthermore, in coming years, these systems will be able to offer comprehensive evaluations of entire organs, their functions and diseases from a demographic, individual, and systemic point of view, all the way down to the genetic and molecular levels.
What’s more, the generic nature of machine vision is already creating synergies between fields. For instance, a soon-to-be-released product from Siemens Medical Solutions will make it possible to automatically measure a fetal head, abdomen and femur in an ultrasound scan—the most important parameters for determining gestation—and compare them to previous measurements.
Yet, thanks to modular software architecture, the same basic technology can already help to accurately detect changes in microscopic cracks on turbine blades, usage variations in parking lots, and the minute-by-minute variations in the patterns of vehicles and pedestrians on city roads and sidewalks—all parts of machine vision’s increasingly intelligent big picture.
Arthur F. Pease