6.1.5 HarkDataStreamSender

6.1.5.1 Details of the node

This node sends the following acoustic signal results by socket communication.

• Acoustic signal

• Frequency spectrum after STFT

• Source information of source localization result

• Acoustic feature

6.1.5.2 Necessary file

No files are required.

6.1.5.3 Usage

When to use

This node is used to send the above data to a system external to HARK using TCP/IP communication.

Typical connection

In the example in Figure 6.10, all input terminals are connected. It is also possible to leave input terminals open depending on the transmitted data. To learn about the relation between the connection of the input terminals and transmitted data, see “Details of the node”.

6.1.5.4 Input-output and property of the node

Table 6.6: Parameter list of HarkDataStreamSender
 Parameter name Type Default value Unit Description HOST localhost Host name /IP address of the server to which data is sent PORT 8890 Port number for outbound network communication ADVANCE 160 [pt] Shift length of frame BUFFER_SIZE 512 Size of allocated float-sized memory for socket communication FRAMES_PER_SEND 1 [frm] Frequency of socket communication in frame unit DEBUG_PRINT false ON/OFF for outputting debugging information SOCKET_ENABLE true Flag to determine whether or not to perform the socket output

Input

MIC_WAVE

: Matrix<float>  type. Acoustic signal (The number of channels $\times$ acoustic signal of window length size STFT in each channel)

MIC_FFT

: Matrix<complex<float> >  type. Frequency spectrum (The number of channels $\times$ spectrum of each channel)

SRC_INFO

: Vector<ObjectRef>  type. Source information on the source localization results of several sound sources

SRC_WAVE

: Map<int, ObjectRef>  type. A sound source ID and acoustic signal (Vector<float>  type) data pair.

SRC_FFT

: Map<int, ObjectRef>  type. A sound source ID and frequency spectrum (Vector<complex<float> >  type) data pair.

SRC_FEATURE

: Map<int, ObjectRef>  type. A sound source ID and acoustic feature (Vector<float>  type) data pair.

SRC_RELIABILITY

: Map<int, ObjectRef>  type. A sound source ID and mask vector (Vector<float>  type) data pair.

Output

OUTPUT

: ObjectRef  type. Same output as the input.

Parameter

HOST

: string  type. IP address of a host to which data is transmitted. It is invalid when SOCKET_ENABLED is set to false.

PORT

: int  type. Socket number. It is invalid when SOCKET_ENABLED is set to false.

: int  type. Shift length of a frame. It must be equal to the value set in previous processing.

BUFFER_SIZE

: int  type. Buffer size secured for socket communication.

FRAMES_PER_SEND

: int  type. Frequency of socket communication in frame unit.

DEBUG_PRINT

: bool  type. ON/OFF of debug to standard output.

SOCKET_ENABLE

: bool  type. Data is transferred to the socket when true and not transferred when false.

6.1.5.5 Details of the node

• Description of the parameters

For HOST, designate a host name or an IP address of the host running an external program to transmit data. For PORT, designate a network port number for data transmission. ADVANCE is the shift length of a frame and must be equal to the value set in previous processing. BUFFER_SIZE is a buffer size to be secured for socket communication. A float type array of BUFFER_SIZE * 1024 is secured at the time of initialization. It must be greater than the transmitted data. FRAMES_PER_SEND is the frequency of socket communication in frame unit. The default value is 1 and sufficient for the most cases, which sends data in every frame. If you want to reduce the amount of socket communication, increase this value. DEBUG_PRINT indicates if debug to standard output should be displayed. This outputs some parts of the transmitted data. For more information, see “Debug” in Table 6.13. When SOCKET_ENABLED is set to false, data is not sent to external systems. This is used to perform a network operation check for HARK without operating an external program.

• Details of data transmission

(B-1) Structure for data transmission

Data transmission is performed for each frame, being divided into some parts. The structures defined for data transmission are listed as follows.

Description: A header that contains basic information on top of the transmitted data
Data size: 3 * sizeof(int)

 Variable name Type Description type int Bit flag that indicates the structure of the transmitted data. For relations between each bit and data to be transmitted, see Table 6.8. advance int Shift length of a frame count int Frame number of HARK
Table 6.8: Each bit and transmit data of the types of HD_Header
 Number of digits Related input terminal Transmit data The first column MIC_WAVE Acoustic signal The second column MIC_FFT Frequency spectrum The third column SRC_INFO Source localization result source information The fourth column SRC_INFO, SRC_WAVE Source localization result source information + acoustic signal for each sound source ID The fifth column SRC_INFO, SRC_FFT Source localization result source information + frequency spectrum for each sound source ID The sixth column SRC_INFO, SRC_FEATURE Source localization result source information + acoustic feature for each sound source ID The seventh column SRC_INFO, SRC_RELIABILITY Source localization result source information + missing feature mask for each sound source ID

In HarkDataStreamSender , The transmitted data differs depending on whether the input terminal can be opened. On the receiving end, the transmitted data can be interpreted according to their types. Examples are given below. Further details on transmitted data are given in (B-2).

• In the case that only the MIC_FFT input terminal is connected, the type is 0000010 in binary number. Moreover, the transmitted data becomes only a frequency spectrum for each microphone.

• In the case that the three input terminals of MIC_WAVE, SRC_INFO and SRC_FEATURE are connected, the type is 0100101 in binary. The data to be transmitted are acoustic signals for each microphone, source information of a source localization result and acoustic features for each sound source ID.

• For the four input terminals of SRC_WAVE, SRC_FFT, SRC_FEATURE and SRC_RELIABILITY, the data to be transmitted are information for each sound source ID and therefore information of SRC_INFO is required. Even if the above four input terminals are connected without connecting SRC_INFO, no data is transmitted. In such a case, the type is 0000000 in binary.

• HDH_MicData
Description: Structural information on the array size for sending two-dimensional arrays
Data size: 3 * sizeof(int)

Table 6.9: Member of HDH_MicData
 Variable name Type Description nch int Number of microphone channels length int Data length (number of columns of the two-dimensional array to be transmitted) data_bytes int Number of bytes of data to be transmitted. In the case of a float type matrix, nch * length * sizeof(float).
• HDH_SrcInfo
Description: Source information of a source location result
Data size: 1 * sizeof(int)+ 4 * sizeof(float)

Table 6.10: Member of HDH_SrcInfo
 Variable name Type Description src_id int Sound source ID x[3] float Three-dimensional position of sound source power float Power of the MUSIC spectrum calculated in LocalizeMUSIC
• HDH_SrcData
Description: Structural information on the array size for sending one-dimensional arrays
Data size: 2 * sizeof(int)

Table 6.11: Member of HDH_SrcData
 Variable name Type Description length int Data length (number of one-dimensional array elements to be transmitted) data_bytes int Number of bytes of transmitted data. In the case of a float type vector, length * sizeof(float).

(B-2) Transmitted data

Table 6.12: Data list in order of sending and connection input terminal (The data with the $\circ$ symbol is transmitted. $\circ ^*$ indicates the data that are not transmitted when the SRC_INFO terminal is not connected)
 Details of the transmitted data Input terminal and transmitted data Type Size MIC_WAVE MIC_FFT SRC_INFO SRC_WAVE SRC_FFT SRC_FEATURE SRC_RELIABILITY (a) HD_Header sizeof(HD_Header) $\circ$ $\circ$ $\circ$ $\circ$ $\circ$ $\circ$ $\circ$ (b) HDH_MicData sizeof(HDH_MicData) $\circ$ (c) float[] HDH_MicData.data_bytes $\circ$ (d) HDH_MicData sizeof(HDH_MicData) $\circ$ (e) float[] HDH_MicData.data_bytes $\circ$ (f) float[] HDH_MicData.data_bytes $\circ$ (g) int 1 * sizeof(int) $\circ$ $\circ ^*$ $\circ ^*$ $\circ ^*$ $\circ ^*$ (h) HDH_SrcInfo sizeof(HDH_SrcInfo) $\circ$ $\circ ^*$ $\circ ^*$ $\circ ^*$ $\circ ^*$ (i) HDH_SrcData sizeof(HDH_SrcData) $\circ ^*$ (j) short int[] HDH_SrcData.data_bytes $\circ ^*$ (k) HDH_SrcData sizeof(HD_SrcData) $\circ ^*$ (l) float[] HDH_SrcData.data_bytes $\circ ^*$ (m) float[] HDH_SrcData.data_bytes $\circ ^*$ (n) HDH_SrcData sizeof(HD_SrcData) $\circ ^*$ (o) float[] HDH_SrcData.data_bytes $\circ ^*$ (p) HDH_SrcData sizeof(HD_SrcData) $\circ ^*$ (q) float[] HDH_SrcData.data_bytes $\circ ^*$
Table 6.13: Details of the transmitted data
 Description Debug (a) Transmitted data header. See Table 6.7. $\circ$ (b) Structure of acoustic signals $\circ$ (number of microphones, frame length, byte count for transmission). See Table 6.9. (c) Acoustic signal (number of microphones $\times$ float type matrix of frame length) (d) Structure of frequency spectra $\circ$ (number of microphones, number of frequency bins, byte count for transmission). See Table 6.9. (e) Real part of frequency spectrum (number of microphones $\times$ float type matrix of number of frequency bins) (f) Imaginary part of frequency spectrum (number of microphones $\times$ float type matrix of number of frequency bins) (g) Number of sound sources detected $\circ$ (h) Source of a source location result. See Table 6.10. $\circ$ (i) Structure that indicates that of acoustic signals for each sound source ID $\circ$ (frame length, byte count for transmission). See Table 6.11. (j) Acoustic signal for each sound source ID (float type linear array of frame length) (k) Structure that indicates that of frequency spectra for each sound source ID $\circ$ (number of frequency bins, byte count for transmission). See Table 6.11. (l) Real part of a frequency spectrum for each sound source ID (float type linear array of number of frequency bins) (m) Imaginary part of a frequency spectrum for each sound source ID (float type linear array of number of frequency bins) (n) Structure that indicates that of acoustic features for each sound source ID $\circ$ (dimension number of features, byte count for transmission). See Table 6.11. (o) Acoustic feature for each sound source ID (float type linear array of dimension number of features) (p) Structure that indicates that of MFM for each sound source ID $\circ$ (dimension number of features, byte count for transmission). See Table 6.11. (q) MFM for each sound source ID (float type linear array of dimension number of features)

Transmitted data is divided for each frame as shown in (a)-(q) of Tables 6.12 and 6.13. Table 6.12 shows the relation between the transmitted data (a)-(q) and the input terminal connected, and Table 6.13 describes the transmitted data.

  calculate{
Send (a)
IF MIC_WAVE is connected
Send (b)
Send (c)
ENDIF
IF MIC_FFT is connected
Send (d)
Send (e)
Send (f)
ENDIF
IF SRC_INFO is connected
Send (g)
(Let the number of sounds ’src_num’.)
FO  R i = 1 to src_num (This is a sound ID based routine.)
Send (h)
IF SRC_WAVE is connected
Send (i)
Send (j)
ENDIF
IF SRC_FFT is connected
Send (k)
Send (l)
Send (m)
ENDIF
IF SRC_FEATURE is connected
Send (n)
Send (o)
ENDIF
IF SRC_RELIABILITY is connected
Send (p)
Send (q)
ENDIF
ENDFOR
ENDIF}

(B-3) Transmission algorithm Some parts of the algorithm that operate in a loop when executing the HARK network file are shown above. Here, (a)-(q) in the code correspond to (a)-(q) in Tables 6.12 and 6.13.