HARK Document Version 3.0.0. (Revision: 9272) : AudioStreamFromWave

6.1.3 AudioStreamFromWave

6.1.3.1 Outline of the node

This node reads speech waveform data from a WAVE file. The waveform data is read into a Matrix<float> type: indexed, multichannel audio waveform data with rows as channels and columns as samples.

6.1.3.2 Necessary file

Audio files in RIFF WAVE format. There are no limits for the number of channels and sampling frequency. For quantization bit rates, 16-bit or 24-bit signed integers linear PCM format are assumed.

6.1.3.3 Usage

When to use

This node is used when wishing to use WAVE files as input to the HARK system

Typical connection

Figures 6.9 and 6.10 show an usage example of the AudioStreamFromWave node. Figure 6.9 shows an example of AudioStreamFromWave converting Matrix<float> type multichannel waveforms read from a file into frequency domain with the MultiFFT node. To read a file with AudioStreamFromWave , designate a filename in Constant node (Normal node FlowDesigner) and generate a file descriptor in the InputStream node as shown in Figure 6.10. Further, connect the output of the InputStream node to the iterator subnetwork(LOAD_WAVE in Figure 6.10), which contains networks of various nodes of HARK such as AudioStreamFromWave .

$\includegraphics[width=.8\textwidth ]{fig/modules/AudioStreamFromWave1}$

Figure 6.9: Connection example of AudioStreamFromWave 1

$\includegraphics[width=.8\textwidth ]{fig/modules/AudioStreamFromWave2}$

Figure 6.10: Connection example of AudioStreamFromWave 2

6.1.3.4 Input-output and property of node

Table 6.4: Parameter list of AudioStreamFromWave

Parameter name	Type	Default value	Unit	Description
LENGTH	`int`	512	[pt]	Frame length as a fundamental unit for processing.
ADVANCE	`int`	160	[pt]	Frame shift length.
USE_WAIT	`bool`	`false`		Designate if processing is performed in real time

Input

INPUT: : Stream type. Receive inputs from the InputStream node in IO category of FlowDesigner standard node.

Output

AUDIO: : Matrix<float> type. Indexed, multichannel audio waveform data with rows as channels and columns as samples. The number of columns is equal to the parameter LENGTH.
NOT_EOF: : bool type. Indicate if the file can still be read. Used as an ending flag for loop processing of files. When reaching the end of file, its outputs false and outputs true in other cases.

Parameter

LENGTH: : int type. The default value is 512. Designates the frame length, which is a base unit of processing, in terms of number of samples. The higher the value, the higher the frequency resolution, but the lower the temporal resolution. It is known that length corresponding to $20 \sim 40$ [ms] is appropriate for the analysis of audio waveforms. The default value of 32 [ms] corresponds to the sampling frequency 16,000 [Hz].
ADVANCE: : int type. The default value is 160. Designates the frame shift length in samples. The default value of 10 [ms] corresponds to the sampling frequency 16,000 [Hz].
USE_WAIT: : bool type. The default value is false. Usually, acoustic processing of the HARK system proceeds faster than real time. This option can be used to add "wait time" to the processing. When wishing to process for input files in real time, set to true. However, it is not effective when the processing speed is lower than that of real time.

6.1.3.5 Details of the node

Applicable file format: RIFF WAVE files can be read. The number of channels and quantization bit rate are read from headers of files. The format IDs that indicate sampling frequency and quantization method are ignored. The number of channels and sampling frequency correspond to arbitrary formats. When sampling frequency is required for processing, they should be set as parameters required by nodes (e.g. GHDSS , MelFilterBank ). The linear PCM by 16- or 24-bit signed integers are assumed for the quantization method and bit counts.

Rough indication of parameters: When the goal of processing is speech analysis (speech recognition), about $20 \sim 40$ [ms] would be appropriate for LENGTH and $1/3 \sim 1/2$ of LENGTH would be appropriate for ADVANCE. In the case that sampling frequency is 16000 [Hz], the default values of LENGTH and ADVANCE are 32 and 10 [ms], respectively.