Ticket #366 (new task)
Opened 13 years ago
Review Processing of Prompt Files for non-English languages
Reported by: | kmaclean | Owned by: | kmaclean |
---|---|---|---|
Priority: | major | Milestone: | Acoustic Model 0.1.2 |
Component: | Scripts | Version: | 0.1-alpha |
Keywords: | Cc: |
Description
I strip many characters from the original prompt file. From the Prompts.pm script (note: anything with a leading "#" is a comment):
# cleanup prompts $linescalar = join(" ", @line); $linescalar =~ tr/a-z/A-Z/; # change to uppercase $linescalar =~ s/,//g; # remove commas $linescalar =~ s/\.//g; # remove periods # dealing with quotes # $linescalar =~ s/\'//g; # remove single quotes; but need words like "don't" - need to research this more ... # $linescalar =~ s/\'\b(.*)\b\'/$1/g; # remove single quotes from quoted text; single quote must be at start of a word, and at end of a word - does not work if there are two words with single quotesin them in same sentence ... $linescalar =~ s/\'EM//g; # remove leading single quotes for contraction of them $linescalar =~ s/\"//g; # remove double quotes $linescalar =~ s/://g; # remove colon # $linescalar =~ s/-//g; # compound word dash $linescalar =~ s/--//g; # double dash $linescalar =~ s/ - / /g; # space dash space punctuation $linescalar =~ s/ -/ /g; # space dash punctuation $linescalar =~ s/;//g; # semi-colon $linescalar =~ s/!//g; # exclamation mark $linescalar =~ s/\?//g; # question mark # Other cleanup !!!!!! need to change the prompts files directly rather than doing this!!! or add to dictionnary!!! $linescalar =~ s/&/AND/g; $linescalar =~ s/2000/TWO THOUSAND/g;
So I don't remove single quotes. Not sure what to do, maybe a new set a rules for each language.
Note: See
TracTickets for help on using
tickets.