Developer Bernd Holz auf der Heide displays an avatar on a demonstration model of the new SX1 cell phone
Their names are Cora, Liam, Cyberella and Womble. They are small, globular and green. They advise customers at banks and call centers, guide visitors through ministries and research institutes, and teach at schools and universities. They can also read cell-phone SMS messages out loud, as in the latest development from Siemens Information and Communication Mobile (ICM) in Munich. But in spite of their many capabilities, their speech comprehension is still limited and only truly effective in narrowly defined conversational situations. Welcome to the world of avatars, creatures that exist only in computers. They are as varied as the tasks they perform. But they have one thing in common: They are all supposed to facilitate access to systems and information.
Originally, these creatures made of pixels and polygons were intended to act as the Internet chat identities of their flesh-and-blood counterparts. But today they prefer to romp around in computer games and educational software. The emotional, personal way in which they address users, their independent lives and, last but not least, the fun factor that goes along with them, add zest to even the driest topics. But what's really important is that virtual assistants can make it significantly easier to operate a great variety of devices and systems. Whereas avatars are the "face" presented to the customer, the actual work is performed by software agents (see Scenario 2015: Special Agents of the Future in Pictures of the Future, Fall 2001). The latter race through the Internet like bloodhounds and search for information in databases, flight schedules, or instruction manuals. They give their spoils to avatars, who then present them to users.
"Living Characters is our expression for assistants and avatars in the virtual world," says Bernd Holz auf der Heide, manager of the Living Characters project and an expert in user interface innovations at Siemens Information Communications Mobile (ICM). Holz Auf der Heide and his team were the first to develop avatars that live in cell phones. No later than next year, a cute, big-footed creature named Womble will hop, splash, pout and make merry on the display of the Siemens SX1 cell phone. Womble will make using a cell phone more fun. For instance, when the battery is recharging, Womble's body will display a rainbow of stripes. When not in use, Womble will juggle balls, watch butterflies or just blow bubbles. Furthermore, in the not-too-distant future, Womble will not only get mail but read it out loud with matching gestures and facial expressions. Moreover, he is expected to play a major role in a fun multimedia cell phone for young people to be launched in the spring of 2004. Womble is made possible by a 3-D engine that portrays a three-dimensional model in real time and uses light and shadow to bring it to life. Such engines are already built into some cell phone games and could be used to create avatars regardless of which cell phone model is involved.
E-Mail Avatars. Womble already works in demonstrations. He acts as an interface to software agents; he reads messages from the Internet and takes part in auctions on eBay. By the same token, stars, news anchors and stock market gurus could one day read out their "breaking news" on cell phones, or Michael Jordan could send a "personal" goodbye to his fans. "It's technically feasible , but third-party service providers still don't have this technology," laments Holz auf der Heide. When this hurdle is overcome and the specifications for avatars are standardized, it will be possible to send them from a server to many network users or from partner to partner the prerequisite for spreading them further. Currently, a consortium led by Siemens and Nokia is developing a standard that will define all the current requirements connected with the use of 3-D animations. The preconditions for a world of avatars are therefore taking shape.
Smart Manuals. Wouldn't it be nice, when some device or appliance isn't working, to be able simply to call up a personal assistant who would tell you what to do? Well, so-called "natural-language dialogue systems" are in fact already available. "We have a voice-based user manual for the Hicom telephone optiset," says Dr. Hans-Ulrich Block, a linguist from the Interaction Technologies department at Siemens Corporate Technology (CT) in Munich. "The manual can be called up for almost 200 pieces of support information," The system's Virtual Call Center Agent (ViCA) voice dialogue system, which he helped develop, is designed to allow customers or co-workers to access complex support services.
"Hi, Embassi, could you please put on 'Out of Africa'?" A personal avatar appears on the screen. "Of course," it replies, and the digital video recorder starts to hum. A pie-in-the-sky vision? "No," says Thomas Heider, a computer scientist at the Fraunhofer Institute for Computer Graphics in Rostock, Germany. "In the Embassi project's model living room, it already works." Embassi a German acronym that stands for "multimodal assistance for infotainment and service infrastructures" is a pilot project of the German Ministry of Education and Research (BMBF). Heider worked for four years with other specialists on new user-control systems for home electronics devices that can be operated by means of gestures, facial expressions, text input or voice commands. The Embassi planning assistant even develops strategies for multiple devices. It can, for example, adjust the room lighting and TV screen brightness, and when a video title is called out, the system locates the right media resource, adjusts the brightness and plays the film. But hurdles still exist. If the word "dark" is spoken in conversation, for example, the lights shouldn't go out. Hence, users try to address the system with "Embassi, please..." (software components can be downloaded at www.embassi.de/open_embassi/).
At Siemens Corporate Technology (CT) in Munich, Hans Röttger is designing a multimodal communications booth for SmartKom-Public, part of the SmartKom (www.smartkom.org) project. Here, the good old telephone booth will be upgraded with a videophone, document camera and Internet access. Natural language, graphic-user interfaces and gesture recognition will make it easier to do things like reserve movie tickets. In a natural-language dialogue with an avatar, users will be able to inquire about films, ask for directions, or reserve tickets. SmartKom will use SIVIT gesture-recognition technology (Siemens Virtual Touchscreen) to replace the mouse at some public information points. In 2002, an interactive shopping window (picture) was tested in Düsseldorf. The technology allows customers to point to articles in a display window to obtain information without having to go inside. The computer recognizes gestures via video camera and translates them into mouse clicks.
Similarly, cars may eventually have computers that can be personalized, "but it will probably be ten years before that happens," predicts Dr. Hans-Wilhelm Rühl, who is responsible for automotive voice module integration at Siemens VDO. A navigation system designed by Rühl recognizes 2,000 words. In three years, it will probably be able to recognized over 8,000. "In five years, drivers may be able to say: 'destination Hamburg, radio station FFN,' without having to press a button or remember any special commands," says Rühl. But in a loud automotive environment, the system has to be far more robust than systems built for home use. Voice recognition systems must listen to the driver and not the children in the back seat. But Rühl is convinced that "in a few years it will be possible to operate any infotainment, navigation or e-mail system in a vehicle cockpit by voice."
All the caller has to do is to ask questions in natural language. The system asks follow-up questions in order to fill in any missing parameters. The caller is spared time-consuming enumerations of options, such as, "For yes, press one; for no, press two..." If the question is, "How can I turn off the calling signal?" the dialogue partner reacts with the information unit "Calling Signal." A reply might be: "To turn off the calling signal for your telephone, pick up the receiver and press star 97." At each stage of the dialogue, the menu tree is dynamically recalculated, and unneeded query structures are therefore omitted. A user who knows the system and specifies the necessary information in one sentence gets the answer he or she needs very quickly.
In the case of ambiguous input, as is often provided by inexperienced users, the dialog engine simply requests more information. A dialog interpreter also recognizes when it can no longer provide help, and in that case routes the caller to a human agent.
Womble, the green pear-shaped character, not only reads out e-mail and helps users, but also plays around on the display when nothing is happening. Another avatar in the form of a young man helps users get to know the cell-phones features
When it comes to appliances such as washing machines, ranges and refrigerators, however, things are different. Since these systems do not have a PC interface, it is difficult to integrate natural-language-based operating instructions into their other controls. "For the time being, the best solution for such appliances is for the manufacturer to offer a natural-language help desk of the sort we've developed as a customer service," says Block.
How Much Personality? "Virtual assistants always walk along a razor's edge between acceptance and rejection," says Holz auf der Heide, who is a trained psychologist. Sight, hearing, feeling all of these human faculties are addressed by virtual characters in order to make it easier to operate equipment. But to ensure virtual helpers don't arouse our displeasure, people have to be able to rely on avatars and speak with them. Their actions must be comprehensible, says Holz auf der Heide. And yet, it is precisely their self-willed personalities and their unpredictability that exert a certain fascination, as is the case with human beings. They shouldn't act on their own authority too much, however, and nonsensical questioning can be an annoyance, according to the unanimous view of experts and users. Since virtual assistants act on behalf of their masters, security is a top priority particularly when it comes to legally binding transactions. "But a digital signature ensures the authenticity and integrity of the agent, and the assistant can be uniquely assigned to its user," explains Kai Fischer, a security expert at CT in Munich.
Because of the increasing complexity of many systems, demand for user-friendly assistance systems will continue to grow. Computers in cars, stereos and video systems (see box above) are only the beginning. Avatars will one day recognize emotions, too, and remember the likes and dislikes of their users. If a user is afraid of flying, for instance, a train will be chosen when possible. And when a restaurant is selected, the user's preferences will be weighed into the decision, along with location and availability information. Ideally, avatars will change their behavior on the basis of experience.
Avatars will also serve as an aid to interpersonal communication. They could appear on the cell phone display of the person you are calling as a three-dimensional likeness of yourself or some imaginary character and smilingly accept an invitation to a concert, for example. Technically, it is already possible for someone to send their photo as an image file to a software service on the Internet and have the image transformed into an animated model. At that point the user will have created a virtual twin of himself or herself (see How to Mail a Smile in Pictures of the Future, Spring 2003).
Avatars can also acquire knowledge of the real world via cell phone-mounted cameras, microphones and sensors. Genuine interaction is thus possible between humans and virtual entities. "The Womble of tomorrow will put on sunglasses, lick an ice-cream cone and ask me whether I want one too. And then he will show me the way to the nearest ice-cream parlor," says Holz auf der Heide. "But the actual intelligence that makes these actions possible in the first place comes from the mobile network infrastructure. Today's cell phones lack the computing power to run the software." And intelligence is important because these virtual characters have many applications. All in all, avatars will be able to relieve us of so many routine virtual world tasks that we will probably have more time to enjoy the attractions of the real world.
Birgitt Salamon