IJCAI-PRICAI 2020 Tutorial on
Robot Audition Open Source Software HARK
Date and place
|Date & Time||January 8, 2021 18:40-22:00 (JST) [9:40-13:00 UTC]
(detailed schedule below)
IJCAI-PRICAI 2020 tutorial on HARK was done.
The youtube videos on HARK TV are available here.
If you register for this tutorial (T46) when you register for IJCAI-PRICAI 2020, the registration chair will send you an email containing an invitation to the SLACK space for this tutorial. Because any information about this tutorial will be announced at #general in the SLACK space, please create a SLACK account and log-in the space according to the SLACK invitation you received. If you do not receive an email containing a SLACK invitation, please let us know (hark18-reg _at_ hark.jp).
What is this tutorial about?
This is a tutorial for robot audition open source software HARK (HRI-JP Audition for Robots with Kyoto University). HARK was released in 2008 as a collection of robot audition functions such as sound source localization, sound source separation, and automatic speech recognition to be applied to real-world applications including robots, aiming at the audio equivalent to OpenCV. The total number of downloads has exceeded 16,000 as of December 2019.
The tutorial will consist of two lectures, four hands-on lessons, two case study reports, and two live demonstrations. Attendees will learn all about HARK, that is, what robot audition and HARK are, how to use HARK, and how it works with live demonstrations. For hands-on lessons, we ask for every participant to bring their own laptop PC (CPU: Core i5/i7 series, Memory: > 4GB, with headphone jack) with one USB 3. 0 port for a bootable USB device and another USB port to connect to a microphone array. We will provide a Ubuntu-bootable USB device and lend a microphone array device to each participant. Also, we recommend having headphones to practice listening to sounds. In the tutorial, student teaching assistants will help participants with any questions at any time.
Background of the tutorial
Computational Auditory Scene Analysis (CASA) has been studied in Artificial Intelligence for many years. CASA workshops have been organized in 1995, 1997, and 1999. In AAAI 2000, as an extension of CASA to real-world applications, robot audition was proposed. At that time, robotics started growing rapidly, but most studies in robotics used a headset microphone attached close to the mouth. Robot audition aims at solving such a problem, and constructing functions enabling a robot to listen to sounds with its own ears. It has been evolving both in artificial intelligence and robotics for the last 20 years, and sound source localization, separation and automatic speech recognition have been developed as highly noise-robust functions to cope with various noises such as ambient noise, speech noise, ego-noise, and reverberation with many publications at top conferences in artificial intelligence, robotics, and signal processing; IJCAI, AAAI, PRICAI, IROS, ICRA, ICASSP, INTERSPEECH, etc.
In 2008, the methods in those publications were collected as open source software called HARK (HRI-JP Audition for Robots with Kyoto University), aiming at the audio-equivalent to OpenCV. The total number of downloads has exceeded 16,000 as of December 2019. Although we have held 17 HARK full-day tutorials in Japan and abroad including IEEE Humanoids-2009 and IROS 2018 supported by JSAI and other societies, we did not have any tutorials in conferences on artificial intelligence, and we believe that it is time to have a tutorial in IJCAI in conjunction with IJCAI’s holding in Japan.
Although HARK provides various auditory functions which are useful to cope with real-world problems, it looks like the number of people who know about HARK are limited to the field of artificial intelligence. In reality, combinations of HARK, deep learning techniques, and end-to-end robot audition techniques have been reported, with some of them being used for social implementation. Once people in the field of artificial intelligence know HARK and related techniques, it will be beneficial for them. Therefore, we will organize this tutorial from the following viewpoints to maximize IJCAI participants’ benefits:
- To increase the awareness of HARK
- To increase users’ overall ability when using HARK
- To lower the barrier to introduce robot audition techniques
- To expand applications of HARK into AI fields to realize its full potential
HARK and Its Deployment
Outline of the tutorial
The following is a list of topics (keywords) addressed in the tutorial.
- Overview of robot audition technologies
- Introduction to robot audition
- Basics in sound source localization
- Basics in sound source separation
- Basics in automatic speech recognition
- Practice using HARK (PC and microphone array(provided) are necessary)
- Sound source localization
- Sound source separation
- Integration with automatic speech recognition
- Integration with ROS
- Reports of case studies with HARK
- Drone audition for rescue and search
- Bird song analysis for ornithology to analyze birds’ behaviors and communication
- Live demonstrations of robot audition applications
- Sound source localization for drone audition
- Speech enhancement for active scope robots
The detailed schedule for the above items is described in the previous section.
This is a half-day (2-slot) tutorial consisting of two lectures, four hands-on lessons, introduction of two case studies and two live demonstrations. Attendees will learn what robot audition and HARK are, how to use HARK, and how it works with a live demonstration.
|Slot 1 (1:35)||Contents||Presenters|
|Opening remark & Lecture 1: Introduction||Prof. Hiroshi G. Okuno, Waseda University, Japan|
|Lecture 2: Overview of HARK||Prof. Kazuhiro Nakadai, Honda Research Institute of Japan / Tokyo Institute of Technology, Japan|
|Practice 0: Preparation of Your Laptop||Taiki Yamada, Tokyo Institute of Technology, Japan|
|Practice 1: Sound Source Localization||Dr. Ryosuke Kojima, Kyoto University, Japan|
|Practice 2: Sound Source Separation & ASR||Dr. Kotaro Hoshiba, Kanagawa University, Japan|
|Practice 3: Integration with ROS and MQTT||Dr. Katsutoshi Itoyama, Tokyo Institute of Technology, Japan|
|Case Study 1: Drone Audition||Prof. Makoto Kumon, Kumamoto University, Japan|
|Case Study 2: Bird Song Analysis||Prof. Reiji Suzuki, Nagoya University, Japan|
|Closing Remarks||Prof. Hiroshi G. Okuno, Waseda University, Japan|
Robot audition bridges artificial intelligence and robotics. From the viewpoint of artificial intelligence, robot audition will be an interesting topic to provide auditory functions to deal with real-world problems. Also many people related to artificial intelligence are interested in multidisciplinary research topics to learn about real-world problems and applications in other fields. Robot audition is a suitable topic for this purpose. The following people are potential target audiences:
- People who are interested in and/or are working on robot audition and its related research area/problems
- People who want to introduce robot audition technologies to their robots (it is easy because HARK has a seamless interface with ROS) and related applications
- People who want to learn robot audition technologies
- People who want to contribute to robot audition open source software
- People who want to use HARK in their own research areas besides robot audition
Since the tutorial includes not only hands-on lessons, but lectures to learn basics in robot audition. People do not need any special knowledge for robot audition in advance. However, it includes hands-on lessons, and we therefore recommend that participants should know basic operations for Ubuntu (Linux) OS. During the tutorial, we will prepare an adequate number of student teaching assistants, and thus each participant can ask any questions at any time.
Equipment to prepare
Each participant should prepare the following items to attend this tutorial:
- A laptop computer. Requirements are as follows:
- CPU: Core i5 / i7 series
- 4 GB of RAM
- VMware/Virtualbox can work properly (installed in advance)
- Need >50GB storage for VM
- Device to listen to sound such as headphone or loudspeaker
Principal Researcher, Honda Research Institute Japan Co., Ltd. /
Specially-Appointed Professor, Tokyo Institute of Technology
Hiroshi G. Okuno
Professor, Waseda University
Associate Professor, Kumamoto University
Associate Professor, Nagoya University
Specially-Appointed Associate Professor (Lecturer), Tokyo Institute of Technology
Assistant Professor, Kyoto University
Assistant Professor, Kanagawa University