3.5 Speech recognition fails


I am making a speech recognition system with HARK but it does not recognize any speech.


If you have not yet tried the recipes in Chapter 2, starting with this chapter may be easier for you. In this chapter, the user will learn about a speech recognition system with HARK, in the order of sound recording, sound source localization, sound source separation and speech recognition.

Next, an inspection method will be described if an original system is developed by the user does not work. Since a large number of elements are included in a speech recognition system, it is important to verify possible causes one by one. First, confirm that HARK  and FlowDesigner were installed properly (Recipe Installation fails), that sound is recording properly (Recipe Sound recording fails), that sound source localization is working properly (Recipe Sound source localization fails), and that sound source separation is working properly (Recipe Sound source separation fails) in each recipe.

If you verify that the system works properly through this stage, then, speech recognition must be verified. We presume that Julius (http://julius.sourceforge.jp/), a large vocabulary continuous speech recognition engine, which has been modified for HARK, is used in this system. Three files are important for using Julius.

  1. Acoustic model: A model indicating the relationships between features and phonemes of acoustic signals

  2. Language model: A model of the language spoken by the user of the system.

  3. Configuration file: File names of these two models

If the system does not work at all or Julius does not start, a wrong path may have been designated for the file. Thus, the configuration file should be checked, for details, see Recipe Making a Julius configuration file(jconf). To verify the acoustic model, see Creating an acoustic model; to verify the language model, see Creating a language model.


This solution is for the system which does not recognize sounds at all. If the system works properly but its success (=recognition) rate is low, the system must be tuned up. Tuning up a speech recognition system involves complicated problems of localization, separation, and recognition, among others.

For example, see parameter tuning of GHDSS (Recipe Parameter tuning of sound source separation) for separation and tuning of PostFilter (Recipe Reduce the leak noise by post processing). Also see the chapter 10, which discusses features used for recognition, and recipes for improving performance ( Tuning parameters of sound source localization and Parameter tuning of sound source separation).

See Also

Recipes only for when the system shows no recognition are shown here

  1. Each recipe in the Chapter c:SomethingIsWrong]Something is wrong

  2. Acoustic model and Language model.

  3. Making a Julius configuration file (.jconf)

  4. Julius Book