2.1 Technology to distinguish sounds is the base for robot audition

According to the Astro Boy Encyclopedia (written by Mitsumasa Oki, Shobunsha), Astro Boy is equipped with a sound locator that “rises his audibility by 1,000 times by pressing a switch, and enables him to hear distant voices and further to hear ultrasonic waves of 20,000,000Hz” ¹. The sound locator must be a super device that realizes the “Cocktail party effect” discovered by Cherry in 1953, which distinguishes speeches selectively. Hearing-impaired persons or elderly adults with weak hearing may ask, “Isn’t a function that distinguishes simultaneous utterances enough, though it’s not a super device?”. The chronicle of Japan, Suiko-ki, introduces an anecdote of “Prince Shotoku” who understood utterances from ten speakers at the same time and judged. The old tale “Ear Hood,” in which people can hear and understand animals, trees and plants, inspires the imagination of kids. If we could give such a separation function to a robot, it would be able to interact with humans much more easily.

It is needless to say that the most important communication tool in daily life is the speech, including speaking or singing voices. Speech communication includes word acquisition and back channels by non-voice and has many various functions. Indeed, the importance of research on Automatic Speech Recognition (ASR) has been highly recognized and huge amounts of funding and effort have been spent over the past 20 years. On the other hand, there have been only few studies on systems that distinguish sounds with microphones attached to robots themselves and recognize speech, except those by Aso et al. The stance of the authors has been to develop a processing method for sounds using minimal prior knowledge. Therefore, we considered that it would be important to study on sound environment understanding for analyzing sound environment not only through speeches but also through music, environmental sounds, and a mixture of those. From this position, it is understood that the present ASR, which assumes single speech input, can no longer play an important role in robotics.

$\includegraphics[width=0.8\columnwidth ]{fig/Intro/RobotAuditionMap.eps}$

Figure 2.1: Development of robot audition based on sound environment understanding

Footnotes

http://www31.ocn.ne.jp/ goodold60net/atm_gum3.htm