3.3 Sound source localization fails

Problem

My sound source localization system does not work well.

Solution

This recipe introduces the usual debugging procedures for sound source localization systems.

Confirm the recording

 
Make sure the system can record sound. See Sound recording fails.

Offline localization

 
Try offline localization using recorded wave files. Walk around the microphone array while speaking and record the sound. This is your test file.

Replace the node AudioStreamFromMic with AudioStreamFromWave in your network for offline localization. When DisplayLocalization and SourceTracker are connected, the localization result can be visualized.

If this step fails, set the parameter DEBUG to ON of LocalizeMUSIC and check the MUSIC spectrum. This value should be larger in the presence than in the absence of sound. If not, the transfer function file may be wrong. If the spectrum seems successful, adjust the parameter THRESH of SourceTracker so that its value is bigger than in the non-sound area and smaller than in the sound area.

Other parameters of LocalizeMUSIC can also be adjusted:

  1. Set NUM_SOURCE to the total number of speakers.
    For example, if there are up to two speakers, set NUM_SOURCE to 2.

  2. Set MIN_DEG and MAX_DEG.
    If localization results are obtained from the direction where the sound is not located, it may be cause by a reflection from the wall or by a noise source (e.g. fans of a PC). As a solution, it is possible to assume that the target signals does not come from these directions, and the localization result can be set such that the output is within the range of MIN_DEG and MAX_DEG.

    For example, if no sound is expected to come from behind, then the localization target can be set to output results only from the front by setting MIN_DEG to -90 and MAX_DEG to 90.

Online localization

 
Next is to use the original network file that uses AudioStreamFromMic , and then execute the network. The THRESH parameter of SourceTracker may need to be tuned because the value of MUSIC spectrum depends on the distance and volume of the speaker. Try to tune THRESH within the 0.1 - 0.2 range. If there are errors in the localization result, try to increase THRESH. If there is no localization result besides speaking to the microphone, try to decrease THRESH.

Measuring Ambient Noise

 
If there is a difference between the ambient noise level of the environment where the transfer function of LocalizeMUSIC is measured and where the sound source localization is executed, it may cause the performance to deteriorate. For example, if there is an airconditioner in the direction near the sound source localization, the sound source localization result may output that of the airconditioner. There are two countermeasures to avoid this case:

  1. Direction Filtering
    Connect the SourceSelectorByDirection node after the SourceTracker node to ignore a sound source localization result from a specific direction. Another option is to change the MIN_DEG and MAX_DEG parameters of LocalizeMUSIC to specify a direction that will not output a sound source localization result.

  2. Creating a Noise Correlation Function
    The localization that eliminates the effect of the noise is possible by recording the noise beforehand and passing it to LocalizeMUSIC . It is effective when the noise is much powerful compared to the sound. Refer to Learning sound separation for more details.

Discussion

It may be appropriate to change parameters, depending on reverberations in the room and the volume of the speaker’s voice. To increase the localization performance of the current system, it may be better to tune up in the actual location. Since THRESH is the most important parameter here, adjusting it only will be the first thing to try.

See Also

If you are new to building a sound source localization system, refer to Learning sound localization. The chapter c:Localization]Sound source localization includes some recipes for localization. If you want to know the detailed algorithm, see LocalizeMUSIC and SourceTracker in the HARK  document.