Ticket #57 (new defect)
Numbered words with different spellings causing interpretation problems
Reported by: | kmaclean | Owned by: | kmaclean |
---|---|---|---|
Priority: | minor | Milestone: | Acoustic Model 1.0 |
Component: | Speech Rec Engine | Version: | 0.1-alpha |
Keywords: | Cc: |
Description
CMU Dictionnary has words with different pronunciations; for example:
ABABA [ABABA] ax b aa b ax ABABA(2) [ABABA(2)] aa b ax b ax ZERO [ZERO] z ih r ow ZERO'S [ZERO'S] z ih r ow z ZERO'S(2) [ZERO'S(2)] z iy r ow z ZERO(2) [ZERO(2)] z iy r ow
The word in the square bracket is returned from the recognizer, and this causes problems with test scores since recognition may return the alternate pronunication.
For example, if the speech rec engine recognizes zero(2), it returns 'zero(2)', but the mlf has the word 'zero', and HTK's HResults does not recognize that they should be the same, and marks it as a misrecognized words and lower recognition scores accordingly.
Need to update the dictionnary so that the words with more than one pronunciation return the correct word, i.e. the dictionnary should look like this:
ABABA [ABABA] ax b aa b ax ABABA(2) [ABABA] aa b ax b ax ZERO [ZERO] z ih r ow ZERO'S [ZERO'S] z ih r ow z ZERO'S(2) [ZERO'S] z iy r ow z ZERO(2) [ZERO] z iy r ow
Change History
Note: See
TracTickets for help on using
tickets.