SourceSeparation Node

Outline of the node

This node conducts blind sound source separation based on independent vector analysis.

Typical connection

The type of both the input and output of SourceSeparation node is multi-channel (2-ch) audio spectrum. Typical connection of this node is depicted as follows:

_images/ss_connection.png

Input-output and property of the node

Input

INPUT_AUDIO_SPECTRUM Matrix<complex<float> >
Windowed spectrum data. A row index is channel, and a column index is frequency.

Output

OUTPUT_AUDIO_SPECTRUM Matrix<complex<float> >
Windowed and speech-enhanced spectrum data . A row index is channel, and a column index is frequency.

Parameters

Parameters of this node are listed as follows:

Parameter name Type Default value Unit Description
FFT_LENGTH int 512 sample Analysis frame length.
ITERATION_METHOD string FastIVA   Iteration method.
MAX_ITERATION int 700   Processing limitation: maximum number of iterations.
NUMBER_OF_SOURCE_TO_BE_SEPARATED int 2   Number of sound sources to be separated.
SEPARATION_TIME_LENGTH float 5.0 second Separation window length.
ADVANCE int 160 sample The length in sample between a frame and a previous frame.
SAMPLING_RATE int 16000 Hz Sampling rate.

Details of the node

This module conducts recovery of the original sound signals from the combined sound signal by using independent vector analysis (IVA) [1] or Fast independent vector analysis (Fast-IVA) [2]. In the case of IVA, the objective function is Kullback-Leibler (KL) divergence:

\(C={\rm constant}- \sum^F_f {\rm log}\left|{\rm det} W_{mkf}\right| - \sum^M_m E\left[{\rm log}P \left( \hat{S}_1, \cdots ,\hat{S}_M \right)\right]\)

where \(\hat{S}_m (m = 1, \cdots, M)\) and \(W_{mkf}\) represent the input signal of m-th microphone and the separation matrix of IVA, respectively. The lerning algorithm of IVA is based on natural gradient-descent method:

\(W^{new}_{mkf}=W^{old}_{mkf} + \eta \sum^K_k \left( I_{mk} - E \left[ \frac{\hat{S}_{kf}}{\sqrt{\sum^F_f \left| \hat{S}_{kf} \right|^2}} \hat{S}_{kf}^{\ast} \right] \right) W^{old}_{mkf}\)

where \(\eta\) is learning rate (set at 0.1)

In the case of Fast-IVA, following modified objective function based KL divergence on is used:

\(C=-\sum^M_m E\left[{\rm log}P \left( \hat{S}_1, \cdots ,\hat{S}_M \right)\right] - \sum^M_m \beta\left[W^T_{mkf}W^{new}_{mkf}-1\right]\),

where \(\beta\) is Langrangian multiplier. The learning algorithm, on the other hand, is based on newton method with fixed point iteration:

\(W^{new}_{mkf}= E\left[\frac{1}{\sqrt{\sum^F_f \left|\hat{S}_{kf}\right|^2}} - \frac{\hat{S}^2_{kf}}{\left( \sqrt{\sum^F_f \left|\hat{S}_{kf}\right|^2}\right) ^3}\right] W^{old}_{mkf} -E\left[\frac{\hat{S}_{kf}}{\sqrt{\sum^F_f \left|\hat{S}_{kf}\right|^2}} X_{kf}\right]\)

References

[1]
  1. Kim, H. T. Attias, S. Lee, and T. Lee, “Blind Source Separation Exploiting Higher-Order Frequency Dependencies,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 70–79, January 2007.
[2]
  1. Lee, T. Kim, and T. Lee, “Fast fixed-point independent vector analysis algorithms for convolutive blind source separation” Signal Processing, vol. 87, no. 8, pp. 1859–1871, August 2007.