6.1.3 SaveRawPCM

6.1.3.1 Outline of the node

This node saves speech waveform data in the time domain as files. The outputted binary files are Raw PCM sound data, where sample points are recorded as 16 [bit] or 24 [bit] integer numbers. Depending on the input data type, a multichannel audio file, or multiple monaural audio files (one for each separated sound) are output.

6.1.3.2 Necessary file

No files are required.

6.1.3.3 Usage

When to use

This node is used when wishing to convert separated sound into waveforms with the Synthesize  node to confirm a sound, or when wishing to record the sound from a microphone array by connecting it with the AudioStreamFromMic  node.

Typical connection

Figures 6.8 and 6.9 show a usage example of SaveRawPCM . Figure 6.8 shows an example of saving multichannel acoustic signals from AudioStreamFromMic  into a file using the SaveRawPCM  node. As shown in this example, select a channel to save to a file using the ChannelSelector  node. Note that since SaveRawPCM  accepts Map<int, ObjectRef>  type inputs, the MatrixToMap  node is used to convert from the Matrix<float>  type into the Map<int, ObjectRef>  type. Figure 6.9 shows an example for saving a separated sound using the SaveRawPCM  node. Since the separated sound output from the GHDSS  node or the PostFilter  node, which suppresses noise after separation, is in the frequency domain, it is converted into a waveform in the time domain using the Synthesize  node before it is input into the SaveRawPCM  node. The WhiteNoiseAdder  node is usually used for improving the speech recognition rate of the separated sound and is not essential for the use of SaveRawPCM .

\includegraphics[width=.9\textwidth ]{fig/modules/SaveRawPCM-1}
Figure 6.8: Connection example of SaveRawPCM  1

\includegraphics[width=.9\textwidth ]{fig/modules/SaveRawPCM-2}
Figure 6.9: Connection example of SaveRawPCM  2

6.1.3.4 Input-output and property of the node

Table 6.4: Parameter list of SaveRawPCM 

Parameter name

Type

Default value

Unit

Description

BASENAME

string 

sep_

 

Prefix of name of the file to be saved.

ADVANCE

int 

160

[pt]

Shift length of the analysis frame of the speech waveform

       

to be saved in a file.

BITS

int 

16

[bit]

Quantization bit rate of speech waveform to be saved in a file.

       

Choose 16 or 24.

Input

INPUT

: Map<int, ObjectRef>  or Matrix<float> . In the case of Map<int, ObjectRef> , the object should be a Vector<complex<float> >  that is an audio signal in frequency domain. Matrix<float>  data contains a waveform in the time domain where each row corresponds to a channel.

Output

OUTPUT

: Map<int, ObjectRef> .

Parameter

BASENAME

: string  type. By default, this designates the prefix of the filename as sep_. The filename output is "BASENAME_ID.sw" when a sound source ID is attached. In other words, when BASENAME is sep_, the filenames of separated sounds when separating a mixture of three sounds is sep_0.sw, sep_1.sw, sep_2.sw.

ADVANCE

: int  type. This must correspond to the values of ADVANCE of other nodes used.

BITS

: int  type. Quantization bit rate of speech waveform to be saved in a file. Select 16 or 24.

6.1.3.5 Details of the node

Format of the files saved: The files saved are recorded as Raw PCM sound data without header information. Therefore, when reading the files, users need to designate either 16 [bit] or 24 [bit] as the appropriate quantization bit rate, as well as sampling frequency and track quantity. Moreover, the written files vary depending on the type of input as follows.

Matrix<float>  type

The file written is a multichannel audio file with a number of channels equivalent to the number of rows of the input.

Map<int, ObjectRef>  type

The written files have a filename with an ID number after BASENAME and monaural audio files are written for each ID.