14.6.2 Evaluating the speech recognition

The next step is to evaluate the success rate of speech recognition using an evaluation script score.py.

> 3_Evaluation.sh

Each argument of the python script means that a speech recognition log, a reference data, a sound direction, and a tolerance.

After you run the script, you will see the result like Fig. 14.34. Starting from the left, each row means that the recognition is succeed or not, recognition result, and the reference. The last line means the overall success rate. In this case, 33 utterances out of 40 utterances are successfully recognized, consequently, the success rate us 85%.

ground truth

recognition result

status

"pork-cutlet-bowl"

""

Deletion

"curry-and-rice"

""

Deletion

"beef-bowl"

"beef-bowl"

Correct

"seafood-salad"

"seafood-salad"

Correct

"scrambled-eggs"

"scrambled-eggs"

Correct

 

(skipped)

 

"beef-bowl"

"beef-bowl"

Correct

33 / 40 (82.5 %)

   
Figure 14.34: Recognition result.

For any directions, the success rates should be around 80%. If the rate is extremely low, check if you specified the correct pair of a direction and a reference data. If the rate is still low, the separation or recognition may fail. Listen to the files in sep_files/ to check if the separation is succeeded, or refer to the recipes in Chapter 3.