12.1 Selecting window length and shift length

Problem

Read this section to determine optimal window and shift lengths for analyses.

Solution

Length is the window length of speech for analysis, generally 20-40 ms. If the sampling frequency is fs Hz, length = fs/1000 * x, with x being 20-40 ms. Advance is an analysis frame shift length, which generally overlaps 1/2-1/3 of the preceding and following frames. When performing speech recognition, it is necessary to use the same Length and Advance for acoustic model creation.

Discussion

When analyzing speech, the range in which signals can be assumed to be weakly stationary is 20-40 ms; therefore, this section describes settings yielding these lengths. Shift length is determined as the execution width of a window. Concretely, determine the head of a rectangular window with energy equivalent to that of a window function. This window length is not utilized for frame processing of the same sample redundantly when analyzing continuous frames, making frame processing possible without discarding samples. Since the energy of window functions for speech analyses is about 1/3-1/2 of the rectangular window length, the amount of frame shift should be within this range. Although 1/3 is a conservative setting and may cause redundant frame processing of the same sample, few samples are discarded. Although samples may be discarded at settings of 1/2, depending on window functions, redundant frame processing does not occur. However, when using a rectangular window for analysis, the shift length must be equal to the analysis frame length. For triangular windows, the frame shift amount is 1/2.