Pushing a button is all it takes to make German Chancellor Gerhard Schröder laugh. Thomas Riegel from Siemens Corporate Technology in Munich makes a couple of adjustments with a mouse, and Schröder suddenly scowls from the flatscreen monitor, just as if the real Chancellor had suddenly received some more bad news about the economy. "We used Schröder because his is a face that everyone recognizes here in Germany," explains Riegel with a smile. "But we could have taken a picture of anyone."
The image of Schröder's face is stored on the computer's hard disk. It takes Riegel and his Chinese colleague Liu Yanghua only a couple of minutes and a few mouse clicks to transform the rigid still into a two-dimensional animated icon. "All I need to do is select 20 points on the photo and then make a few adjustments," says Liu, a 24-year-old doctoral student from Tsinghua University in Beijing. Siemens has been collaborating with the renowned Chinese university for around five years now. During that period, a number of scientists from Tsinghua have worked alongside multimedia specialists at Siemens Corporate Technology.
Making the German Chancellor laugh on a computer screen might not be everyone's idea of a good joke. Yet it's a development with big potential for mobile phone applications. Using the procedure devised by Siemens researchers, cell phone users will be able to take a picture with an integrated digital camera, convert it directly into an animated avatar on the phone's display, and then send it in the form of an MMS. "We're not quite that far yet," admits Riegel. At present, the researchers still have to process the images on a PC. "But there's no real reason why you shouldn't be able to generate an avatar on a mobile phone as well," he adds. Indeed, it would also be perfectly possible to use an animated face to read out e-mails or Website texts on a mobile phone or minicomputer.
The procedure used by Riegel and his Chinese colleagues employs MPEG-4, an international standard developed with Siemens' participation. MPEG-4 should enable the highly efficient transmission of multimedia data such as video and audio files, photos and 3D imagesusing today's networks. The use of this standard has the advantage that the technology would be able to circulate relatively quickly among cell phone users. Other companies have their own developments in this field, but these require special software and are therefore not directly compatible with different network operators and mobile phone models.
"Although we use a standard, it's still very complex to write a program that will display a model of a human face," says Liu. To guarantee a realistic image, the computer must be able to extrapolateon the basis of the 20 points selecteda lattice of 200 to 300 points that cover the face like a grid. To make the face laugh, for example, specific points in the lattice must be made to move. Initially, Riegel, Liu and their colleagues worked on 3D avatars. But these turned out to be unsuitable for cell phone displays because of the large amount of processing required to generate such images. With this in mind, Liu wrote a program capable of generating facial expressions such as pleasure, anger, surprise and sadness in two dimensions. "Compared to 3D avatars, the new program requires only one-fifth of the computing capacity," she explains.
All in all, Liu spent one year at Siemens before returning to China at the end of 2002. Her successor, Zhang Jun, has already arrived in Munich. Back at Tsinghua University, Liu is now completing her doctoral thesis under the supervision of Professor Xu Guang-You from the Institute of Human-Computer Interaction at the Faculty of Computer Sciences. Alongside avatars, other areas of interest for Xu's team include face-recognition and voice-identification technology. For example, his researchers have developed a program that can locate the position of human faces on video images in less than 100 milliseconds without having to identify them individually. What the software does is search for a set of clues featuring a face such as bars of dark color above lighter stripsa sure sign of a pair of eyes with eyebrows above them. Another program then identifies the person by means of face and speaker recognition. Using this technology, Xu has created an advanced lecture hall equipped with cameras and microphones that first identify the professor giving the lecture and then grant him or her exclusive access to the hall's multimedia systems. For example, when the lecturer gives instructions with voice and gestures, the program dims the lighting for a slide show. At the same time, the corresponding material is automatically loaded onto the students' PCs. Finally, the system records the lecture, so that students can access the material over the Internet at any time. Thanks to multimedia streaming technology, the system automatically adjusts the content to the available bandwidth and, if necessary, transmits color images in black and white, which naturally reduces data flow.
Although Xu's work with Siemens is limited to the avatar project, he exercises an indirect influence on other projects. "In the past, we were too academic," he admits. "Siemens helped us become more realistic. For example, the company told us that if we wanted to find a broad application for our avatars, the MPEG-4 standard would be crucial." Similarly, the company provides the researchers with a lot of feedback. Siemens, on the other hand, benefits from close contact to one of China's most important universities. For Xu, there are two major advantages to the exchange program: his students gain experience with Western culture, and they also learn about industry. "This knowledge gets handed on to students back in Beijing. In the end, many more people profit than just the exchange students themselves," says Xu. Adds Liu Yanghua: "I learned a lot about programming, including how to write algorithms." And it was also quite an experience to put a smile on the German Chancellor's face.
Norbert Aschenbrenner