6.1.6 SaveWavePCM2

6.1.6.1 Outline of the node

This node saves speech waveform data in time domain as files. The difference between this and the SaveRawPCM node is only the format of the output files, that is, the wave file format has a header. Therefore, audacity or wavesurfer can easily read the output files of this node. If you want to read a waveform using the AudioStreamFromWave node, use this node instead of using the SaveRawPCM node. The difference from the SaveWavePCM node is that the input terminal ENABLE can control whether the file is saved or not at any timing. Also, the time stamp is inserted into the file name, so that the uniqueness of the file name is always ensured. Note that the file is not saved because the input terminal ENABLE is recognized as falseif it is not connected. If you always want to save the file as a replacement for SaveWavePCM , set the Constant node to trueand connect.

6.1.6.2 Necessary files

No files are required.

6.1.6.3 Usage

When to use

The same as the SaveRawPCM node. This node is used when wishing to convert separated sound into waveforms in the Synthesize node to confirm the sound or when wishing to record sound from a microphone array by connecting it to the AudioStreamFromMic node.

Typical connection

The usage is almost the same as for the SaveRawPCM node. The only difference is the SAMPLING_RATE parameter. You can use this node by replacing the SaveRawPCM node with SaveWavePCM2  node in Fig. 6.11 and 6.12. Note that the ENABLE terminal must be connected with the Constant node set to true. If you want to control whether or not files are saved dynamically, you need to be able to control trueand falseexternally, for example using HARK-Python.

6.1.6.4 Input-output and property of the node

Table 6.8: Parameter list of SaveWavePCM2 

Parameter name

Type

Default value

Unit

Description

BASENAME

string 

sep_

 

Prefix or format string of the file name to save.

ADVANCE

int 

160

[pt]

Shift length of the analysis frame of the speech waveform to be

       

saved in a file.

SAMPLING_RATE

int 

16000

[Hz]

Sampling rate. This parameter is set in the header.

BITS

string 

int16

[bit]

Quantization bit rate of speech waveform to be saved

       

in a file. Choose int16 or int24.

INPUT_BITS

string 

as_BITS

[bit]

Quantization bit rate of input speech waveform.

Input

INPUT

: Map<int, ObjectRef> or Matrix<float> type. The former is a structure containing a sound source ID and waveform data (such as a separated sound) and the latter is a waveform data matrix of multiple channels.

SOURCES

: Vector<ObjectRef> type. Sound source localization results (Vector of Source type objects) are acceptable. This input is optional.

ENABLE

: bool type. The file is saved only when this input terminal is 1 or true. Differences from SaveWavePCM .

Output

OUTPUT

: Map<int, ObjectRef> or Matrix<float> type. The output data is the same as the input.

OUTPUT_FILE_NAME

: string type. If the input is Matrix<float> type, the name of the currently saved file is output. If the input is Map<int, ObjectRef> type, it is an empty string. Difference from SaveWavePCM .

OUTPUT_FRAME_COUNT

: int type. When the input is Matrix<float> type, the number of frames included in the saved file is output. If the input is of type Map<int, ObjectRef> , then the value is undefined (usually 0 but not guaranteed as it is uninitialized). Difference from SaveWavePCM .

Parameter

BASENAME

: string type. It is used for formatting the file name. The default value is sep_. See the next section for details.

ADVANCE

: int type. It is necessary to make this the same as the values of ADVANCE of other nodes.

SAMPLING_RATE

: int type. It is necessary to make this the same as the values of SAMPLING_RATE of other nodes. This value is used only for the header and you cannot change the SAMPLING_RATE of the A/D converter.

BITS

: string type. Quantization bit rate of the speech waveform to be saved in a file. Select int16 or int24.

INPUT_BITS

: string  type. Quantization bit rate of input speech waveform. Select int16 or int24. as_BITS means the same value as BITS.

6.1.6.5 Details of the node

Format of the files saved: The files are saved as Wave PCM sound data format with header information. Therefore, when reading the files, users don’t need to specify sampling frequency, track quantity and quantization bit rate. Moreover, the written files vary depending on the types of inputs as follows.

Matrix<float> type

The file written is a multichannel audio file with the same number of channels as the number of rows in the input.

Map<int, ObjectRef> type

The written files have filenames with an ID number after BASENAME, and monaural audio files are written for each ID.

The file name determination: This node provides two ways of file name determination.

  1. Prefix [default]
    The value of BASENAME paramseter is used as prefix. After that, the sound source ID is concatenated and the time stamp follows in the form .YYYYMMDD.hhmmss.uuuuuu. Finally, the extension .wav is attached. For example, if the BASENAME is sep_ and the input source IDs are 1, 2, 3, then, the file names become sep_0.20210101.120001.987654.wav, sep_1.20210101.120003.123456.wav, sep_2.20210101.120005.246890.wav.

  2. Format string [after HARK 2.3.1]
    If the BASENAME contains a special pattern {tag:format}, the value is interpreted as a format string. Using this formatting, the users can set more flexible file names.

    The number of acceptable values for tag is four (Table 6.9) the srcid and date, meaning the source ID and the time, respectively, can be used without any limitations. However, to use azimuth and elevation for the format string, the SOURCES input of the node must be connected to utilize the sound source localization results.

    The value format is optional. This part can be used to specify the digits, such as 03d. The expression is the same as printf.

    Table 6.9: Tag List

    Tag

    Description

    Unit

    srcid

    Source ID

    integer

    date

    Time when the file is saved

    String (yyyyMMDD-HHmmss)

    azimuth

    Azimuth of the localized sound (SOURCES input required)

    degree(integer rounded)

    elevation

    Elevation of the localized sound (SOURCES input required)

    degree(integer rounded)

    Example format strings

    Insert source ID at the middle of the file name

     

    • FORMAT: wav_id_{srcid}_output

    • OUTPUT: wav_id_0_output.20210101.120001.987654.wav, wav_id_1_output.20210101.120003.123456.wav, ...

    Insert source ID with zero padding

     

    • FORMAT: wav_id_{srcid:03d}

    • OUTPUT: wav_id_000.20210101.120001.987654.wav, wav_id_001.20210101.120003.123456.wav ...

    Insert azimuth to the filename

     

    • FORMAT: wav_az_{azimuth}

    • OUTPUT: wav_az_30.20210101.120001.987654.wav, wav_az_-10.20210101.120003.123456.wav ...

    Note that, in this case, the file is overwrittend if two localization results have the same azimuth. To avoid this, adding srcid in the format is recommended to ensure the uniqueness of the file name.