HARK Document Version 3.4.0. (Revision: 9509) : SaveRawPCM

6.1.4 SaveRawPCM

6.1.4.1 Outline of the node

This node saves speech waveform data in the time domain as files. The outputted binary files are Raw PCM sound data, where sample points are recorded as 16 [bit] or 24 [bit] integer numbers. Depending on the input data type, a multichannel audio file, or multiple monaural audio files (one for each separated sound) are output.

6.1.4.2 Necessary file

No files are required.

6.1.4.3 Usage

When to use

This node is used when wishing to convert separated sound into waveforms with the Synthesize node to confirm a sound, or when wishing to record the sound from a microphone array by connecting it with the AudioStreamFromMic node.

Typical connection

Figures 6.11 and 6.12 show a usage example of SaveRawPCM . Figure 6.11 shows an example of saving multichannel acoustic signals from AudioStreamFromMic into a file using the SaveRawPCM node. As shown in this example, select a channel to save to a file using the ChannelSelector node. Note that since SaveRawPCM accepts Map<int, ObjectRef> type inputs, the MatrixToMap node is used to convert from the Matrix<float> type into the Map<int, ObjectRef> type. Figure 6.12 shows an example for saving a separated sound using the SaveRawPCM node. Since the separated sound output from the GHDSS node or the PostFilter node, which suppresses noise after separation, is in the frequency domain, it is converted into a waveform in the time domain using the Synthesize node before it is input into the SaveRawPCM node. The WhiteNoiseAdder node is usually used for improving the speech recognition rate of the separated sound and is not essential for the use of SaveRawPCM .

$\includegraphics[width=.9\textwidth ]{fig/modules/SaveRawPCM-1}$

Figure 6.11: Connection example of SaveRawPCM 1

$\includegraphics[width=.9\textwidth ]{fig/modules/SaveRawPCM-2}$

Figure 6.12: Connection example of SaveRawPCM 2

6.1.4.4 Input-output and property of the node

Table 6.5: Parameter list of SaveRawPCM

Parameter name	Type	Default value	Unit	Description
BASENAME	`string`	sep_		Prefix or the format of the file name. See SaveWavePCM for details.
ADVANCE	`int`	160	[pt]	Shift length of the analysis frame of the speech waveform
				to be saved in a file.
BITS	`int`	16	[bit]	Quantization bit rate of speech waveform to be saved in a file.
				Choose 16 or 24.
INPUT_BITS	`string`	as_BITS	[bit]	Quantization bit rate of input speech waveform.

Input

INPUT: : Map<int, ObjectRef> or Matrix<float> . In the case of Map<int, ObjectRef> , the object should be a Vector<complex<float> > that is an audio signal in frequency domain. Matrix<float> data contains a waveform in the time domain where each row corresponds to a channel.

Output

OUTPUT: : Map<int, ObjectRef> .

Parameter

BASENAME: : string type. By default, this designates the prefix of the filename as sep_. The filename output is "BASENAME_ID.sw" when a sound source ID is attached. In other words, when BASENAME is sep_, the filenames of separated sounds when separating a mixture of three sounds is sep_0.sw, sep_1.sw, sep_2.sw.
ADVANCE: : int type. This must correspond to the values of ADVANCE of other nodes used.
BITS: : int type. Quantization bit rate of speech waveform to be saved in a file. Select 16 or 24.
INPUT_BITS: : string type. Quantization bit rate of input speech waveform. Select 16 or 24. as_BITS means the same value as BITS.

6.1.4.5 Details of the node

Format of the files saved: The files saved are recorded as Raw PCM sound data without header information. Therefore, when reading the files, users need to designate either 16 [bit] or 24 [bit] as the appropriate quantization bit rate, as well as sampling frequency and track quantity. Moreover, the written files vary depending on the type of input as follows.

Matrix<float> type: The file written is a multichannel audio file with a number of channels equivalent to the number of rows of the input.
Map<int, ObjectRef> type: The written files have a filename with an ID number after BASENAME and monaural audio files are written for each ID.