4.4.3 TransferFunction

TransferFunction is a data type that is used to express the transfer function information. In HARK  , EstimateTF  (output), LocalizeMUSIC  (input), GHDSS  (input), a sequence of nodes such as this will apply the estimated transfer function to the localization and separation.

TransferFunction has the following information below:

  1. Informationtype: string  type. The kind of the file. "transfer function" means that the file includes all information. "partial transfer function" means that the file includes only the updated information.

  2. Source position: Vector$<$Position$>$ type. Source localization information for the transfer function. When the Filetype is "partial transfer function", this parameter is empty.

  3. Neighbor information for source positions: Vector$<$Neighbor$>$ type. Neighbor information for the source. When the Filetype is "partial transfer function", this parameter is empty.

  4. Microphone position: Vector$<$Position$>$ type. Microphone information for the transfer function. When the Filetype is "partial transfer function", this parameter is empty.

  5. Transfer function for localization: Map$<$ID,Matrix$<$complex$<$float$>$$>$$>$ type. The transfer function for localization.

  6. Transfer function for separation: Map$<$ID,Matrix$<$complex$<$float$>$$>$$>$ type. The transfer function for separation.

4.4.3.1 Position type

Position type, which expressess the microphone and source position of the transfer function, has the following information below:

  1. ID: int  type. ID of the position.

  2. Coordinate type: Coordinate type. Coodinate type.

  3. Coordinate: float  type. Coordinate of the position.

  4. Path: string  type.Path of the wave file.

  5. Matrix data: Matrix$<$complex$<$float$>$$>$ type. Matrix data. This parameter is not used.

  6. Enable the channnel set information: int  type. Enable the channel set information. This parameter is not used.

  7. Channel set information: Vector<int>  type. Channel set information. This parameter is not used.

4.4.3.2 Neighbor type

Neighbor type, which expresses the neighbor information of the source, has the folllowing information below:

  1. ID: Vector<int> type. IDs which has the neighbors.

  2. Neighbor( ID ): Vector$<$ Vector<int> $>$ type. Neighbor information by ID.

  3. Neighbor( Position ): Vector$<$ Vector$<$ Position $>$ $>$ type. Neighbor information by Position type.

  4. Algorithm: NeighborAlgorithm type. The search algorithm for the neighbors.

4.4.3.3 Config type

Config type, which expresses the configuration information, has the following information below:

  1. Comment: string type. The description of the file. Any string is acceptable.

  2. Synchronous Average: int type. The number of repetition of a signal used for transfer function measurement (TSP signal).

  3. Path: string type. The path of the audio file for transfer function measurement (TSP signal).

  4. Offset: int type. The offset during transfer function calculation.

  5. Length: int type. The length of a signal for transfer function measurement (TSP signal) in a sample.

  6. Begin index for peak search: int type. The begin index when searching for the peak of the direct sound during transfer function calculation.

  7. End index for peak search: int type. The end index when searching for the peak of the direct sound during transfer function calculation.

  8. FFT length: int type. The length of Fourier transform during transfer function calculation.

  9. Sampling rate: int type. The sampling rate.

  10. signal Max: int type. The maximum amplitude of the recorded signal for transfer function measurement.

——–

Problem

Read this to know more about nodes that uses (Map $<\cdot $,$\cdot >$) for input/output such as MFCCExtraction  and SpeechRecognitionClient .

Solution

The Map  data type is a set of key and the data that corresponds to that key. For example, in a 3-speaker simultaneous speech recognition, features are separated for each speaker. Then, each feature is assigned a key based on the speaker’s speech index ID. The key and data are then handled as a set. Through this each speaker and speech can be distinguished.