SourceSeparation Node¶

Outline of the node¶

This node conducts blind sound source separation based on independent vector analysis.

Typical connection¶

The type of both the input and output of SourceSeparation node is multi-channel (2-ch) audio spectrum. Typical connection of this node is depicted as follows:

Input-output and property of the node¶

Input¶

INPUT_AUDIO_SPECTRUM Matrix<complex<float> >: Windowed spectrum data. A row index is channel, and a column index is frequency.

Output¶

OUTPUT_AUDIO_SPECTRUM Matrix<complex<float> >: Windowed and speech-enhanced spectrum data . A row index is channel, and a column index is frequency.

Parameters¶

Parameters of this node are listed as follows:

Parameter name	Type	Default value	Unit	Description
FFT_LENGTH	int	512	sample	Analysis frame length.
ITERATION_METHOD	string	FastIVA		Iteration method.
MAX_ITERATION	int	700		Processing limitation: maximum number of iterations.
NUMBER_OF_SOURCE_TO_BE_SEPARATED	int	2		Number of sound sources to be separated.
SEPARATION_TIME_LENGTH	float	5.0	second	Separation window length.
ADVANCE	int	160	sample	The length in sample between a frame and a previous frame.
SAMPLING_RATE	int	16000	Hz	Sampling rate.

Details of the node¶

This module conducts recovery of the original sound signals from the combined sound signal by using independent vector analysis (IVA) [1] or Fast independent vector analysis (Fast-IVA) [2]. In the case of IVA, the objective function is Kullback-Leibler (KL) divergence:

$C={\rm constant}- \sum^F_f {\rm log}\left|{\rm det} W_{mkf}\right| - \sum^M_m E\left[{\rm log}P \left( \hat{S}_1, \cdots ,\hat{S}_M \right)\right]$

where $\hat{S}_m (m = 1, \cdots, M)$ and $W_{mkf}$ represent the input signal of m-th microphone and the separation matrix of IVA, respectively. The lerning algorithm of IVA is based on natural gradient-descent method:

$W^{new}_{mkf}=W^{old}_{mkf} + \eta \sum^K_k \left( I_{mk} - E \left[ \frac{\hat{S}_{kf}}{\sqrt{\sum^F_f \left| \hat{S}_{kf} \right|^2}} \hat{S}_{kf}^{\ast} \right] \right) W^{old}_{mkf}$

where $\eta$ is learning rate (set at 0.1)

In the case of Fast-IVA, following modified objective function based KL divergence on is used:

$C=-\sum^M_m E\left[{\rm log}P \left( \hat{S}_1, \cdots ,\hat{S}_M \right)\right] - \sum^M_m \beta\left[W^T_{mkf}W^{new}_{mkf}-1\right]$ ,

where $\beta$ is Langrangian multiplier. The learning algorithm, on the other hand, is based on newton method with fixed point iteration:

$W^{new}_{mkf}= E\left[\frac{1}{\sqrt{\sum^F_f \left|\hat{S}_{kf}\right|^2}} - \frac{\hat{S}^2_{kf}}{\left( \sqrt{\sum^F_f \left|\hat{S}_{kf}\right|^2}\right) ^3}\right] W^{old}_{mkf} -E\left[\frac{\hat{S}_{kf}}{\sqrt{\sum^F_f \left|\hat{S}_{kf}\right|^2}} X_{kf}\right]$

References¶

[1]	Kim, H. T. Attias, S. Lee, and T. Lee, “Blind Source Separation Exploiting Higher-Order Frequency Dependencies,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 70–79, January 2007.

[2]	Lee, T. Kim, and T. Lee, “Fast fixed-point independent vector analysis algorithms for convolutive blind source separation” Signal Processing, vol. 87, no. 8, pp. 1859–1871, August 2007.