2.3 Learning sound separation

Problem

First sound source separation with HARK.

Solution

A network file and either an HGTF binary format transfer function file or a microphone position file are needed to perform sound source separation with HARK.

A network for sound source separation requires four items:

Audio signal acquisition:

 
AudioStreamFromMic and AudioStreamFromWave nodes can be used.

Source location:

 
ConstantLocalization , LoadSourceLocation , and LocalizeMUSIC nodes can be used, with ConstantLocalization node being easiest to use. For online processing, use the LocalizeMUSIC node.

Sound source separation:

 
The GHDSS node can be used for sound source separation. The input consists of source location and the audio signal. The output is the separated signal. The GHDSS node requires a transfer function, which is stored as a HGTF binary format file or calculated from a microphone position file.

Separated signal preservation:

 
Since a separated signal is in the frequency domain, a user must use a Synthesize node before using a SaveRawPCM or SaveWavePCM node.

HARK supports HRLE-based post-processing.

Post-processing

:
Noise suppression can be applied to the separated signal, using the PowerCalcForMap , HRLE , CalcSpecSubGain , EstimateLeak , CalcSpecAddPower , and SpectralGainFilter nodes are used.

Figure 2.9, 2.10, and 2.11 are screenshots of sample networks. Fig.2.10 and Fig. 2.11 show sound source separation without and with post-processing, respectively. These sample networks are extensions of those in Learning sound localization, that is, the added GHDSS node. Either an HGTF or a microphone position file should be specified. To listen to the separated signal, use the Synthesize module before use SaveRawPCM or SaveWavePCM node.

\includegraphics[width=6.0cm]{fig/recipes/LearningHARK-separation-main.png}
Figure 2.9: MAIN
\includegraphics[width=\textwidth ]{fig/recipes/LearningHARK-separation-ghdss.png}
Figure 2.10: MAIN_LOOP (without post-processing)
\includegraphics[width=\textwidth ]{fig/recipes/LearningHARK-separation-hrle.png}
Figure 2.11: MAIN_LOOP (with post-processing)

When executing these networks, simultaneous speech from two speakers are separated and saved in sep_0.wav, sep_1.wav, ...

Discussion

Offline / Online

:
For online separation, replace the AudioStreamFromWave with the AudioStreamFromMic node.

Specific direction / Estimated direction

:
To separate sound coming from an estimated direction, use the LocalizeMUSIC node. To separate sound coming from a specific direction, use the ConstantLocalization node. If an estimated sound source location is saved by using the SaveSourceLocation node, it can be loaded using a LoadSourceLocation node.

Measurement-based / Calculation-based transfer fucntion

:
To use a measurement-based transfer function, the parameters TF_CONJ and TF_CONJ_FILENAME should be designated as the “DATABASE” and a corresponding file name. To use a calculation-based transfer function, the parameters TF_CONJ and MIC_FILENAME should be designated as “CALC” and a corresponding file name. To use a new microphone array, either a new transfer function or microphone position file is needed

Online processing may decrease the performance of sound source separation. This can be dealt with by tuning parameters to the the corresponding condition. For parameter tuning, see c:Separation]Sound Source Separation.

See Also

If separation is inadequate, see “Sound source separation fails.” Since sound source separation have been performed after source localization, it is important to confirm if the stages before sound recording or source localization are performed properly. For sound recording and source localization, the recipes entitled “Learning sound recording”, “Learning source localization”, “Sound recording fails” and “Sound source localization fails” may be helpful.