Changes between Initial Version and Version 1 of AcousticTreeQuestions

09/11/07 12:45:44 (15 years ago)

Some docs about tree building


  • AcousticTreeQuestions

    v1 v1  
     1Since continuous speech is very context-dependent and variable it's not sufficient to build model for each phone, acoustics can differ sufficiently if phone is used in different context. That's why for continuous speech often context-dependent models are used. Models doesn't depend on phone name but on the name of next and previous phones and probably on many more parameters. Of course it's not possible to build a model for all combinations of arguments, moreover their number can exceed hundred. That's why usually training software either select the set of models automatically or with a little input from the user. 
     3For example sphinx can build set of models automatically. HTK requires you to pass the list of properties model selection will use and will do the rest itself. Of course if you have hand-made questions it's better to submit them to sphinx too, moreover it allows it.  
     5The important thing is that models are organized in a tree and the parameters you pass are called questions. Decoder asks a question on phone context and decides what model to use. So it's important for you to create a good list of questions. Let's describe how you can do it, in HTK it's a file tree.hed, in sphinx questions are specified in a config file. 
     7We consider the task of tree creation for a new language. For some languages like English, questions already exist of course. So what should you do. Well, just list important things that affects phone acoustic. Collect sources, look for description of acoustic classes: 
     9 * Books on phonetics  
     10 * Festival TTS voices (often has precise description)  
     11 * Questions in similar languages 
     12 * (language page often has a phoneset with classification in IPA, but it's not very precise) 
     14Now read the book and let's try to build the list, often acoustic connected by the following things: 
     16 * List of vowels 
     17 * List of consonants 
     18 * List of vowels for each property: front vowels, back vowels, middle vowels, diphtongs, rounded and long vowels 
     19 * List of fricative consonants 
     20 * List of nasals 
     21 * List of liquids 
     22 * Any other group of phones 
     24I hope you get the idea, now repeat questions for each context - question for left context, right context and phone itself. The result should look like 
     26The number of questions should be small since otherwise you have to collect too much data to train all models. It's recommended to have 20-30 questions for the tree.