Ticket #463 (new defect)

Opened 14 years ago

Last modified 14 years ago

Rethinking Forced Alignment for prompt to speech audio mismatches

Reported by: kmaclean Owned by: kmaclean
Priority: critical Milestone: Acoustic Model 0.1.2
Component: Acoustic Model Version: Acoustic Model 0.1.1
Keywords: Cc:


From tpvelka's post:

I understand what you mean, here is a more detailed analysis:

The cause of the problem:

When doing forced Viterbi alignment HTK may not always generate transcriptions due to overpruning. Without tracing enabled HTK does not report any errors and those transcriptions are not included in the MLF. If you run HERest training it crashes because it cannot find transcriptions for some of the sound files.

Possible solutions:

  1. Let HERest crash, find the problematic file and try to fix it or remove it. Run HERest again and repeat. I do not recomend this since I have spent an entire day doing just that. One iteration of HERest can take quite a long time and you can have 10-20 of such files (or more, depending on the number of errors in the corpus)
  2. Run HVite without pruning and hope that the bad files do not seriously affect your final system performance. I agree that this is not an optimal solution, so a better one would be:
  3. Write a script that either checks the trace log for warnings (I have not actually tried this so I am not 100% sure if they are there), or check the transcriptions against the sound files and deal with those for which the transcriptions are missing.

Change History

comment:1 Changed 14 years ago by kmaclean

More from tpvelka's post:

Hi Ken,

I don't know if I understand you correctly. Why is there a need for retraining? You already have a trained acoustic model, which can be used for forced Viterbi alignment, because in the standard scenario, as described in e.g. HTKBook

  • you start with a phoneme MLF that is created using a pronunciation dictionary and a word MLF
  • with this you run several iterations of HERest, so you have a trained SI acoustic model
  • only after that you have this model you run the forced Viterbi alignment which can cause the problem with overpruning.

Then, all you need to do to avoid the train-crash-repeat problem is to check the aligned MLF for missing parts.

Of course, you can do forced Viterbi on new submissions and use the fully trained model and see if it does not return any result due to overpruning, or you can watch the resulting score, but I am afraid you will only catch the worst errors, like a totaly silent recordinmg etc.

Note: See TracTickets for help on using tickets.