8.2 Tuning parameters of sound source localization


How should I adjust the parameters when sound source localization is suboptimal?


The solution is given for each sound localization problem.

Q.1) Localization directions are indicated poorly messily or are not indicated at all.


When displaying a localization result in DisplayLocalization , the localization directions may not be are indicated precisely messily in some cases. This is due to the use of a low power source of sound has been localized as a sound source. If no directions are shown When results are not indicated at all, it is because of the opposite reason.

  • Change THRESH of SourceTracker   
    This is the parameter that directly changes the expected threshold value of the direction of a sound source. It should be adjusted so that only the peak of the sound source is captured well.

  • Make NUM_SOURCE of LocalizeMUSIC equal to the number of target sounds  
    This enhances the peak of the target sound direction with NULL space, so that the number of peaks to be enhanced changes according to the setting of NUM_SOURCE (number of sound sources). When this setting is wrong, the performance deteriorates, including localizing a peak in the direction of noise or no peak in the direction of the target (In actual localization, only the sharpness of the peaks is degraded, so they still can be used for localization.) If there is only one speaker, the performance will be improved by setting NUM_SOURCE1.

Q. 2) Only one peak appears even though there are plural sound sources


  • Change MIN_SRC_INTERVALin SourceTracker  when there are sound sources nearby (e.g. two sound sources only 10 degrees away from each other). It may be necessary to set the value of MIN_SRC_INTERVAL sufficiently small (less than 10 degrees in this example). When the set value is greater than the angle difference, the two sound sources are localized as one sound source.

  • Make NUM_SOURCEof LocalizeMUSIC equal to the number of the target sounds Same as A.2-1). Note that if the volume is loud enough, and the sounds are far enough apart (more than 40 deg), localization is usually sufficient well even if the parameter is ill configured.

Q.3) Non-vocal sound is used

Sound source localization is processed for each frequency bin designated for these two frequencies. Therefore, setting a frequency totally different from that of the target sound will result in a wider peak. Use frequencies that correspond to those of target sound sources.

Q.4) I can assume that the sound does NOT come from a certain range.

The sound source localization is performed only for the range determined by designating these two values. When wishing to perform localization for 360 degrees, make sure to designate 180 degrees and -180 degrees.


The solutions are parameter tuning of sound source localization. However, if the reverberation of your room is significantly different from the one of which you record the transfer function, you need to re-measure the transfer function. See HARK web page for the transfer function measurement instruction video.

For tuning the sound source localization parameters, especially the ones of SourceTracker node, visualization of MUSIC spectrum is very helpful. Here we describe an example to visualize the MUSIC spectrum using matplotlib, which is a python module.

Step 1: output MUSIC spectrum

If you run your network file that includes LocalizeMUSIC node with DEBUG property is true, you will see that the output to the console (stdout) includes the lines starts with MUSIC spectrum. These values called MUSIC spectrum contain information used by SourceTracker .

Since these values get higher if the sound comes from the corresponding direction at the corresponding time frame, you can see when and from which direction the sound is detected by checking the MUSIC spectrum.

An example of console output is following:

reading A matrix
0:   17.68 -130.00    0.95   27.71
MUSIC spectrum: 26.409233 26.342979 26.311218 26.389189 26.684574 26.641804 26.473591 26.429607 26.394157 26.390869 26.390436 26.384390 26.370937 26.349419 26.316311 26.370241 26.433401 26.504730 26.580458 26.637592 26.661583 26.718044 26.733950 26.621361 26.512089 26.413467 26.282846 26.145596 26.159269 26.168520 26.170301 26.167116 26.173601 26.252003 26.451721 26.356871 26.372644 26.414040 26.297329 26.550825 26.993820 26.333363 26.182224 26.307812 26.486645 26.660778 26.807833 26.925554 27.017572 27.087088 27.137403 27.152445 27.161753 27.167290 27.172033 27.181273 27.207413 27.254276 27.323790 27.464039 27.712482 27.138771 26.808626 26.621407 26.489784 26.495117 26.487270 26.436701 26.437685 26.331017 26.377909 26.405613
0:   17.68 -130.00    0.95   28.03
MUSIC spectrum: 26.710100 26.621223 26.543722 26.562099 26.749601 26.577915 26.392643 26.393244 26.414570 26.474314 26.538136 26.589392 26.624735 26.642746 26.643448 26.634501 26.615681 26.590319 26.574671 26.529377 26.471210 26.485714 26.483780 26.366249 26.233849 26.087498 25.920916 25.750576 25.754738 25.751417 25.738773 25.714027 25.678146 25.666941 25.741886 25.754898 25.834913 25.895306 25.755405 26.226192 26.879301 26.296440 26.137798 26.307751 26.503771 26.708920 26.896206 27.056938 27.192793 27.308073 27.400372 27.416704 27.418081 27.407339 27.390402 27.377577 27.391756 27.456150 27.579704 27.842340 28.030870 27.664623 27.385185 27.154327 26.954231 26.943329 26.903679 26.797655 26.923735 26.714609 26.734642 26.730547
Step2: visualizing MUSIC spectrum

If you use imshow method in matplotlib module, you can easily show the music spectrum.

Assume that you saved the log file above as log.txt and the script below as showMusic.py.

#!/usr/bin/env python
import pylab
import sys

musicspec = [map(float, line.split()[2:]) for line in open(sys.argv[1])
             if "MUSIC spectrum" in line]
musicspec = pylab.array(musicspec).transpose()

pylab.imshow(musicspec, interpolation="nearest", aspect="auto")
pylab.ylabel("Direction of Arrival")
pylab.xlabel("Time [frame]")

Then, you can see the visualized image of MUSIC spectrum with the following command:

$ python showMusic.py log.txt

See Also

See LocalizeMUSIC in the HARK document for a detailed description of the MUSIC algorithm and parameters.