6.1.2 AudioStreamFromWave

Outline of the node

This node reads speech waveform data from a WAVE file. The waveform data is read into a Matrix<float>  type indexed, multichannel audio waveform data with rows as channels and columns as samples.

Necessary file

Audio files in RIFF WAVE format. There are no limits for the number of channels and sampling frequency. For quantization bit rates, 16-bit or 24-bit signed integers linear PCM format are assumed.

Usage

When to use

This node is used when wishing to use WAVE files as input to the HARK system

Typical connection

Figures 6.5 and 6.6 show an usage example of the AudioStreamFromWave node. Figure 6.5 shows an example of AudioStreamFromWave  converting Matrix<float>  type multichannel waveforms read from a file into frequency domain with the MultiFFT  node. To read a file with AudioStreamFromWave , designate a filename in Constant  node (Normal node FlowDesigner) and generate a file descriptor in the InputStream  node as shown in Figure 6.6. Further, connect the output of the InputStream  node to the iterator subnetwork(LOAD_WAVE in Figure 6.6), which contains networks of various nodes of HARK such as AudioStreamFromWave .

\includegraphics[width=.8\textwidth ]{fig/modules/AudioStreamFromWave1}
Figure 6.5: Connection example of AudioStreamFromWave  1

\includegraphics[width=.8\textwidth ]{fig/modules/AudioStreamFromWave2}
Figure 6.6: Connection example of AudioStreamFromWave  2

Input-output and property of node

Table 6.3: Parameter list of AudioStreamFromWave 

Parameter name

Type

Default value

Unit

Description

LENGTH

int 

512

[pt]

Frame length as a fundamental unit for processing.

ADVANCE

int 

160

[pt]

Frame shift length.

USE_WAIT

bool 

false

 

Designate if processing is performed in real time

Input

INPUT

Stream  type. Receive inputs from the InputStream  node in IO category of FlowDesigner standard node.

Output

AUDIO

Matrix<float>  type. Indexed, multichannel audio waveform data with rows as channels and columns as samples. The number of columns is equal to the parameter LENGTH.

NOT_EOF

bool  type. Indicate if the file can still be read. Used as an ending flag for loop processing of files. When reaching the end of file, its outputs false and outputs true in other cases.

Parameter

LENGTH

int  type. The default value is 512. Designates the frame length, which is a base unit of processing, in terms of number of samples. The higher the value, the higher the frequency resolution, but the lower the temporal resolution. It is known that length corresponding to $20 \sim 40$ [ms] is appropriate for the analysis of audio waveforms. The default value of 32 [ms] corresponds to the sampling frequency 16,000 [Hz].

ADVANCE

int  type. The default value is 160. Designates the frame shift length in samples. The default value of 10 [ms] corresponds to the sampling frequency 16,000 [Hz].

USE_WAIT

bool  type. The default value is false. Usually, acoustic processing of the HARK system proceeds faster than real time. This option can be used to add "wait time" to the processing. When wishing to process for input files in real time, set to true. However, it is not effective when the processing speed is lower than that of real time.

Details of the node

Applicable file format RIFF WAVE files can be read. The number of channels and quantization bit rate are read from headers of files. The format IDs that indicate sampling frequency and quantization method are ignored. The number of channels and sampling frequency correspond to arbitrary formats. When sampling frequency is required for processing, they should be set as parameters required by nodes (e.g. GHDSS , MelFilterBank ). The linear PCM by 16- or 24-bit signed integers are assumed for the quantization method and bit counts.

Rough indication of parameters When the goal of processing is speech analysis (speech recognition), about $20 \sim 40$ [ms] would be appropriate for LENGTH and $1/3 \sim 1/2$ of LENGTH would be appropriate for ADVANCE. In the case that sampling frequency is 16000 [Hz], the default values of LENGTH and ADVANCE are 32 and 10 [ms], respectively.