Ticket #52 (new defect)
Update How-to and Tutorial dictionnary to use CMU dictionnary
Reported by: | kmaclean | Owned by: | kmaclean |
---|---|---|---|
Priority: | major | Milestone: | WebSite 0.3 |
Component: | Acoustic Model | Version: | 0.1-alpha |
Keywords: | Cc: |
Description
whereas the How-to and Tutorial uses the smaller Switchboard dictionnary, which has slightly different pronunciations than CMU's dictionnary.
Change History
comment:5 Changed 13 years ago by kmaclean
The pronunciation dictionary used in the Tutorial and How-to is based on the ISIP Switchboard corpus (contains around 27,500 words). Whereas the QuickStart and nightly AM builds is based on version 0.6 of the CMU Pronunciation Dictionary (contains around 130,000 words). Unfortunately, the Switchboard and CMU pronunciation dictionaries use slightly different phoneme syntax. This is enough to make them incompatible from a Grammar and Acoustic Model testing perspective.
When testing an AM using the VoxForge Testing Tutorial (Step 2 - Create Test Prompts), this difference in pronunciation dictionaries may cause the following error, if the user does not select the right voca file - as set out in the instructions:
-------------------------------- ###### check configurations ###### initialize input device ###### build up system Reading in HMM definition...(ascii)...limit check passed defined HMMs: 50 logical names: 506 in HMMList base phones: 44 used in logical done Making pseudo bi/mono-phone for IW-triphone...369 added as logical...done Reading in dictionary... line 18: triphone "*-z+ih" or biphone "z+ih" not found line 18: triphone "z-ih+r" not found line 18: triphone "ih-r+ow" not found > 6 [ZERO] z ih r ow error in reading sample.dict: 1 words failed out of 18 words ERROR: failed to read dictionary, terminated Terminated
Because:
ZERO is defined as: z iy r ow (in voxforge_lexicon) z ih r ow (in my original sample.voca)
Link to where having different dictionnaries is causing problems Adapting Acoustic Models to your voice