Ticket #271 (new enhancement)
periodically tuning acoustic model parameters as corpus gets larger
Reported by: | kmaclean | Owned by: | kmaclean |
---|---|---|---|
Priority: | critical | Milestone: | Acoustic Model 1.0 |
Component: | Acoustic Model | Version: | 0.1-alpha |
Keywords: | Cc: |
Description
email from David Gelbart
adding more acoustic model parameters
The general rule I have seen with ASR systems is that, as the amount of training data increases, it eventually becomes necessary to add more acoustic model parameters in order to get the full benefit of the additional data. On the other hand, using too many acoustic model parameters may cause overfitting (in other words, the system starts modeling quirks of the training data to the point where the system's performance on non-training data is worsened).
Thus, you may need to periodically tune the number of acoustic model parameters you are using. I suppose the easiest way to do this is to create a test set which does not overlap with the training set, and measure word recognition accuracy on the test set for various acoustic model sizes.
use more Gaussians in the Gaussian mixtures
One way to increase the number of parameters is to use more Gaussians in the Gaussian mixtures. (One way to do this in HTK is to add one or more additional mixup stages. This has the advantage that you can use your test set to compare recognition accuracy before and after the mixup, so that you can obtain your recognition accuracy numbers without having to retrain a system from scratch each time.)
move from monophones to triphones
Another way to increase the number of parameters is to move from monophones to triphones (unless you are using triphones already).
reduce the amount of state-tying
Another way is to reduce the amount of state-tying.
Change History
comment:2 Changed 13 years ago by kmaclean
I am not even sure I understand what a Gaussian is (more reading is required on my part...)
I have some tutorial material linked at http://www.icsi.berkeley.edu/~gelbart/edu.html that may be useful. Among the online material, I especially recommend the Columbia/IBM slides. Week 3 talks about Gaussians. A Gaussian in speech recognition is the same as a Gaussian probability density function in probability & statistics. Along with the slides for Week 3, you can find a list of textbook readings that go along with it. These books may be hard to find in public libraries but you could try inter-library loan or a university library (or buy them).
(One way to do this in HTK is to add one or more additional mixup stages. This has the advantage that you can use your test set to compare recognition accuracy before and after the mixup, so that you can obtain your recognition accuracy numbers without having to retrain a system from scratch each time.)
I am not sure what you mean by additional "mixup stages". In Keith's training recipe, his train_mixup.sh script seems to be doing what you are talking about.
Yes. I think the section in the HTK manual that describes this is titled 'Mixture Incrementing'.
One thing I noticed from Keith's scripts is that it seems like you need to chunk the process in order to avoid errors with large speech corpora. From his train_iter.sh script:
...
Do you do something similar in your AM training?
I have only used HTK with small corpora and whole-word modeling (not triphones). So I cannot provide much advice regarding chunking or state-tying.
I think the htk-users mailing list is the best forum for your HTK questions. If you write to that list, I think it would be good to include a description of the VoxForge? project and what you've accomplished so far. That may help motivate people to help you, and it will spread awareness of your project.
Can I include this as a thread on the VoxForge? site?
Please do.
Regards, David
My reply:
Hi David,
thanks for keeping an eye on the VoxForge? project!
Do you do something similar in your AM training?
Basically, it seems like I've got to study Keith's scripts to ensure that the VoxForge? Acoustic Models are as accurate as possible as the corpus increases in size.
Can I include this as a thread on the VoxForge? site?
thanks,
Ken