1.1 Robot audition software is an integrated system

A person “recognizes” sounds in various environments where a multitude of sounds are heard, processes them to communicate with people and to enjoy TV, music or movies. A robot audition system that provides such recognition functionality needs to process sounds heard in various environments at various levels, and therefore it cannot be defined easily, similar to in robot vision.

Indeed, the OpenCV open source image processing software is an aggregate of a huge number of processing modules and therefore it is essential also for robot audition software to consist of aggregates that include the minimum required functions. The robot audition software HARK is the system that aims at being an “auditory OpenCV”. HARK, like OpenCV, contains signal processing algorithms, measurement tools and GUI from a device level for the module necessary to “recognize,” and is also is published as open source. The three major tasks in Computational Auditory Scene Analysis – the understanding of the environment from sound information – are 1) sound source localization, 2) sound source separation and 3) automatic speech recognition. HARK has been developed as an achievement obtained from research in these fields. HARK is currently available for free ¹ as an open source project for research use.

Footnotes

http://winnie.kuis.kyoto-u.ac.jp/HARK/