Version 4 (modified by anonymous, 15 years ago) (diff) |
---|
Resources to aid in the segmentation of speech:
- SFS
- make_labs (festvox)
- Sphinx/Train?
- HTK
- MAUS
- SONIC
- aubio - library for audio labelling
Links:
- an approach:
Date : Wed, 6 Dec 2006 21:27:29 +0800 De : "Xie Zhiqing" <kramxxx@gmail.com> À : htk-users@eng.cam.ac.uk Objet: [HTK-Users] Lightly supervised acoustic training Hi, my name is Mark and I am a student from Singapore. Currently I am working on a project on speech recognition, specifically on trainning portion of the system. From my knowledge, basically what I am supposed to do is to train the system on a small amount of manually transcribed speech (.wav and .lab) and then use it to transcribe a larger amount of untranscribe speech (only with .wav). If the confidence level is high enough, then it will be added into the trainning data and the process will be run iteratively until all the untranscribe speech is added to the trainning data. Is this method correct? >From what I gather from the HTK book, using Hvite it will output transcriptions for the raw speech. The format is as such => start time , end time , phoneme and the total log probability. Is there a connection between the confidence measure and the total log probability? I am currently using Matlab to implement HTK. Are there any good sites that can explain the process in more layman terms?