8.8 The beginning of the separated sound is ignored

Problem

Read this section if

Solution

Adjust as follows.

  1. Create a network file that can save a separation result (see How should I save separated sounds in files?). In this case, sandwich the SourceTracker and SourceIntervalExtender modules between a localization module such as the LocalizeMUSIC module and a separation module such as GHDSS module.

  2. Separate a sound and display or listen to the result.

  3. If the beginning part of the separated sound breaks off, increase PREROLL_LENGTH

  4. If the beginning silent section of the separated sound is too long, reduce PREROLL_LENGTH

Discussion

At the time point sound source localization is first reported, 500 msec has already elapsed from the start of the utterance and the beginning part of the separated sound is lost, leading to a failure of speech recognition. The SourceIntervalExtender module is designed to solve this problem. Measure PREROLL_LENGTH to determine how far to trace back from the start of sound source localization and separation. If PREROLL_LENGTH is too low, the beginning part of a separated sound will be lost, affecting on speech recognition. If, however, PREROLL_LENGTH is too high, an utterance may be connected to the one before or after it, leading to recognition errors in some language models used for speech recognition.

Unit of PREROLL_LENGTH

The unit of PREROLL_LENGTH corresponds to 1 time frame when performing a Fourier transform for a short time. Therefore, its correspondence to actual time depends on the sampling frequency (SAMPLING RATE) designated for the AudioStreamFromMic and AudioStreamFromWave modules and the step size (ADVANCE) of FFT. If all are set at their default settings (sampling frequency, 16000Hz; step size, 160 pt), a change of PREROLL_LENGTH of 1 corresponds to a change of 10 msec.

See Also