6.1.1 AudioStreamFromMic

Outline of the node

This node takes in multichannel speech waveform data from a microphone array. The audio interface devices supported by this node are System In Frontier, Inc. (previously JEOL SYSTEM TECHNOLOGY CO., LTD.), the RASP series, Tokyo Electron Device TD-BD-16ADUSB and ALSA-based devices (e.g. The RME Hammerfall DSP Multiface series). For an introduction to various devices, see Section 8.

Necessary file

No files are required.

How to use

When to use

This node is used when wishing to use speech waveform data from a microphone array as input to the HARK system.

Typical connection

Figure 6.1 shows an example usage of the AudioStreamFromMic  node.

\includegraphics[width=.8\textwidth ]{fig/modules/AudioStreamFromMic}
Figure 6.1: Connection example of AudioStreamFromMic 

Overview of device

Among the devices that the AudioStreamFromMic  node supports, the following are introduced with photos.

  1. Radio RASP

  2. RME Hammerfall DSP series Multiface (Device corresponding to ALSA).

1. Radio RASP

Figure 6.2 shows the appearance of the radio RASP. Connection with the HARK system is established through Ethernet with a wireless LAN. The power is supplied to the radio RASP with an attached AC adapter. Since the radio RASP responds to plug in power, a microphone of the plug in power supply can be connected to the terminal without any change. Sound recording can easily performed without a microphone preamplifier as an advantage.

\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-WL-RASP}
Figure 6.2: Radio RASP

2. RME Hammerfall DSP Multiface series

Figures 6.3 and 6.4 show the appearance of the RME Hammerfall DSP series Multiface. The device communicates with a host PC through a 32bit CardBus. Although a microphone can be connected to the device through a 6.3 mm TRS terminal, a microphone amplifier is used to ensure the input level (Figure 6.4).) For example, the user may connect a microphone to RME OctaMic II and connect OctaMic II and Multiface. OctaMic II supports a phantom power supply, and a condenser microphone that requires phantom power (e.g. DPA 4060-BM) can be connected directly. However, since it does not have a plug in power supplying function, a battery box for plug in power is required to connect plug in power supply type microphones. For example, such battery boxes are attached to Sony EMC-C115 and audio-technica AT9903.

\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-RME}
Figure 6.3: Front view of RME Hammerfall DSP Multiface

\includegraphics[width=.5\textwidth ]{fig/modules/AD/AD-RME-back}
Figure 6.4: Back view of RME Hammerfall DSP Multiface

Input-output and property of node

Table 6.2: Parameter of AudioStreamFromMic 

Parameter name

Type

Default value

Unit

Description

LENGTH

int 

512

[pt]

Frame length as a fundamental unit for processing.

ADVANCE

int 

160

[pt]

Frame shift length.

CHANNEL_COUNT

int 

8

[ch]

Microphone input channel number of a device to use.

SAMPLING_RATE

int 

16000

[Hz]

Sampling frequency of audio waveform data loaded.

DEVICETYPE

string 

WS

 

Type of device to be used.

GAIN

string 

0dB

 

Gain value used with RASP device.

DEVICE

string 

127.0.0.1

 

Character string necessary to access to device. Device name such as "plughw:0,1" or IP address when RASP is used.

Input Not required.

Output

AUDIO

Matrix<float>  type. Indexed, multichannel audio waveform data with rows as channels and columns as samples. Size of the column is equal to the parameter LENGTH.

NOT_EOF

bool  type. This indicates whether there is still input from the waveform to be processed. Used as an ending flag when processing the waveforms in a loop. When it is true, waveforms are loaded, and when it is false, reading is complete. true is output continuously.

Parameter

LENGTH

int  type. The default value is 512. Designates the frame length, which is a base unit of processing, in terms of number of samples. The higher the value, the higher the frequency resolution, but the lower the temporal resolution. It is known that length corresponding to $20 \sim 40$ [ms] is appropriate for the analysis of audio waveforms. The default value of 32 [ms] corresponds to the sampling frequency 16,000 [Hz].

ADVANCE

int  type. The default value is 160. Designates the frame shift length in samples. The default value of 10 [ms] corresponds to the sampling frequency 16,000 [Hz].

CHANNEL_COUNT

int  type. The number of channels of the device to be used.

SAMPLING_RATE

int  type. The default value is 16000. Designates the sampling frequency – how often to sample per second – of the loaded waveforms. When frequencies up to $\omega $ [Hz] are needed for processing, set the sampling frequency to over $2\omega $ [Hz]. When the sampling frequency is high, data generally increases and it makes it difficult to perform real-time processing.

DEVICETYPE

string  type. Select from ALSA, SINICH, RASP and WS. When a device supporting ALSA-based drivers is used, select ALSA. When TD-BD-16ADUSB is used, select SINICH. When RASP2 is used, select RASP. When radio RASP is used, select WS.

DEVICE

string  type. Since input contents are different in each DEVICETYPE, see the following description.

Details of the node

HARK supports three audio devices as follows

  1. JEOL System Technology Co., Ltd. The RASP series,

  2. Tokyo Electron Device LTD. TD-BD-16ADUSB,

  3. ALSA-based devices (e.g. RME Hammerfall DSP series Multiface)

The following are settings and instructions for each device.

RASP series Here, the parameter setting for the use of RASP-2 and Radio RASP are described.

RASP-2

 

CHANNEL_COUNT

8

DEVICETYPE

WS

DEVICE

IP address of RASP-2

Radio RASP

 

CHANNEL_COUNT

16

DEVICETYPE

WS

DEVICE

IP address of Radio RASP

Remarks

Some models of the RASP series have both microphone inputs and line inputs among the 16 channels. When such a model is used, ChannelSelector  node needs to be connected to the AUDIO output of AudioStreamFromMic  node and only the microphone input channel has to be selected.

TD-BD-16ADUSB

CHANNEL_COUNT

16

DEVICETYPE

SINICH

DEVICE

SINICH

Device corresponding to ALSA

CHANNEL_COUNT

8

DEVICETYPE

ALSA

DEVICE

plughw:0,1

Remarks

Designate plughw:a,b. Enter positive integers to a and b. Enter the card number indicated in arecord -l to a. When multiple audio input devices are connected, multiple card numbers are indicated. Enter card number to be used. Enter the subdevice number indicated in arecord -l to b. For a device that has multiple subdevices, enter the number of the subdevice to be used. Devices that have analog input and digital inputs are one of the examples of multiple subdevices.