As in many other areas of work, research benefits from the pooling of resources. That's why Siemens works with partners from around the world. One example is the company's partnership with Shanghai's Jiao-Tong and Beijing's Tsinghua universities in the field of speech recognition systems.
"The Chinese message on the monitor says: "This is a Siemens speech recognition system"
("I need a taxi!") says a businessman in Shanghai and his cell phone promptly dials the number of a taxi company. Science fiction? Not necessarily. Up until recently, speech recognition and speech synthesis only worked in conjunction with a few applications. Soon, however, they will revolutionize the operation of all kinds of devices. Just imagine: a brief spoken statement and your cell phone dials the desired number, your washing machine goes into action, or your television comes on already tuned to your favorite channel. To achieve this goal, researchers and developers have to register and process a huge amount of language-specific data, which they then use to adapt and perfect automatic speech recognition systems. And any company that wants to break into the Chinese market with its 1.3 billion consumers must at least analyze the complexities of Mandarin and Cantonese.
"Siemens is working with leading Chinese universities to drive things forward in this field," says Herbert Tropf, senior consultant at Corporate Technology in Munich and the prime initiator of the Chinese-German partnership. "One of the main areas of interest is the further development of automatic speech recognition. Progress here is crucial not only for speech-driven teleservices such as fixed-line telephones, cell phones or Internet telephony, but also with regard to speech-driven interfaces for consumer applications such as televisions, washing machines or personal digital assistants." The second focus is on the development of automatic speech synthesis, especially a system's ability to read all kinds of texts in as natural a voice as possible. Neither of these things are easy, according to Tropf. Computer-based speech recognition is faced with the problem of the enormous range of dictions. In other words, the computer has to be able to deal with differences in both dialect and unclear pronunciation.
The situation is further compounded by unavoidable ambient noise. "As far as speech synthesis is concerned, we not only have to ensure comprehensibility; we also have to make sure that the computer-generated speech output sounds as natural as possible," explains Tropf. "There's little doubt that people are particularly sensitive and critical in this respect." Another factor that must be taken into account in Chinese are the different inflections in speech. Unlike Western languages, Chinese allows for changes in the meaning of a word through changes in the tone in which it is pronounced. The current state of both speech recognition and speech synthesis is characterized by the use of data-driven, non-rule based approaches. In other words, researchers and developers are relying on speech databases.
That's why the initial focus of the Chinese-German partnership is on a collection of speech samples, starting with the application area of telephony.
Chinese can be extremely complex to analyze. The meaning of the syllable "ma" depends on the tone in which it is spoken
The speech database for Mandarin has already been completed and the corresponding speech recognition algorithms are expected to be available before the end of this year. SJTU is currently collecting additional speech data from about 2000 Cantonese-speaking Chinese citizens. Cantonese is spoken mainly in southern China, and in Canton and Hong Kong in particular. Phonetic lexicadictionaries with phonetic transcriptionsfor Mandarin and Cantonese are essential for the success of a product. "We also need improved phonetic models and algorithms for speech recognition that are not as sensitive to noise," Tropf adds. "Otherwise we can't ensure optimum recognition, and the product's error rate would be too high in noisy areas, such as busy streets."
The Chinese-German partnership is based on close personal cooperation between some 20 scientists at Siemens and the two Chinese universities mentioned above. They communicate via e-mail, by telephone and, of course, face to face. Tropf, for example, comes to China once or twice a year, and a doctoral candidate from Tsinghua University also spent a research year in Germany. In addition, workshops are held in Shanghai, Beijing and Munich to promote the exchange of information.
However, Tropf is also making use of research expertise from outside of Asia. In 1996, Siemens began to set up speech databases in EU-subsidized projects with partners such as Philips, Ericsson, Nokia and IBM. Here too, the aim was to register all languages and dialects in Western Europe as well as the most important Eastern European languages. Since then, activities have been extended to include other regions such as the Arabian Peninsula and the Far East.
"When all these projects are completed, we'll have an international network of expertise for the automatic processing of spoken commands," says Tropf. In fact, he even goes so far as to make a bold prediction: "In the medium to long term, the main focus of speech recognition and speech synthesis research will be the automatic translation of the spoken word."
Sylvia Trage