HARK version 1.2.0 Document : FeatureRemover

6.4.2 FeatureRemover

6.4.2.1 Outline of the node

This node deletes the dimension element specified from the input vector and outputs the vector with a reduced vector dimension.

6.4.2.2 Necessary file

No files are required.

6.4.2.3 Usage

When to use

This node is used to delete unnecessary elements from the acoustic features and elements of vector types such as Missing Feature Mask so as to reduce the dimension number. The feature extraction processing usually extracts static features followed by dynamic features. In this processing, static features are not needed in some cases. This node is used to delete unnecessary features. In particular, logarithmic power terms are often deleted.

Typical connection

The delta logarithmic power term can be calculated by calculating the logarithmic power term with MSLSExtraction and MFCCExtraction and then using Delta . Since the delta logarithmic power term cannot be calculated unless the logarithmic power is calculated, the logarithmic power term is removed after calculating the acoustic feature values, including the logarithmic power. This node is usually connected with the posterior half of Delta and used to remove logarithmic power terms.

$\includegraphics[width=120mm]{fig/modules/FeatureRemover}$

Figure 6.56: Typical connection example of FeatureRemover

6.4.2.4 Input-output and property of the node

Table 6.52: Parameter list of FeatureRemover

Parameter name	Type	Default value	Unit	Description
SELECTOR	`Object`	`<Vector<int> >`		Vector consisting of dimension index
				(Multiple parameters can be designated)

Input

INPUT: : Map<int, ObjectRef> type. A pair of the sound source ID and the feature vector as Vector<float> type data.

OUTPUT: : Map<int, ObjectRef> type. A pair of the sound source ID and the feature vector as Vector<float> type data.

Parameter

SELECTOR: : Vector<int> type. The range is from 0 to the dimension number of the input feature. The user may specify this as many as wished. When the elements of the first and third dimensions are deleted and the dimension of input vector is reduced by two dimensions, <Vector<int> 0 2> . Note that the index of dimension designation begins with 0.

6.4.2.5 Details of the node

This node deletes unnecessary dimension elements from the input vectors to reduce the dimension number of the vectors. It is shown by analyzing audio signals that the logarithmic power of an analysis frame tends to be large in speech sections, vocal sound parts in particular. Therefore, improvement of recognition accuracy can be expected by adopting the logarithmic power term to acoustic features in speech recognition. However, when the logarithmic power term is used directly as features, difference in the sound pickup level is reflected directly to the acoustic features. When difference occurs in the logarithmic power level used for creation of an acoustic model and sound pickup level, the speech recognition accuracy falls. Even when fixing the general instrument setting, the utterers do not necessarily speak with the same level. Therefore, the delta logarithmic power term, which is the dynamic feature of the logarithmic power term, is used. This enables to capture the features that are strong for the difference of sound pickup levels and indicate utterance sections and vocal sound parts.