Ticket #495 (new defect)
20.000km in German Prompts being replaced with Two Thousand in English
Reported by: | kmaclean | Owned by: | kmaclean |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | Acoustic Model | Version: | 0.1-alpha |
Keywords: | Cc: |
Description
'm sorry. I forgot it isn't so obvious if you don't have it in front of your nose all the time. Every file is the main folder/16khz_16bit anonymous-20080405-phz */de5-088 ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA TWO THOUSAND0 KM LANGEN ATLANTIKKÜSTE Should be: ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA 20000 KM LANGEN ATLANTIKKÜSTE or better: ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA ZWANZIGTAUSEND KM LANGEN ATLANTIKKÜSTE (ZWANZIGTAUSEND is the german word for the number 20000) justmoon-20080204-hbp */de5-085 IM JAHR 1998 LEBTEN DORT TWO THOUSAND BÜRGER Should be:*/de5-085 IM JAHR 1998 LEBTEN DORT 2000 BÜRGER or better: */de5-085 IM JAHR 1998 LEBTEN DORT ZWEITAUSEND BÜRGER (ZWEITAUSEND is the german word for 2000) Rest is one of these two sentences and should be replaced the same. justmoon-20080204-hbp */de5-088 ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA TWO THOUSAND0 KM LANGEN ATLANTIKKÜSTE ralfherzog-20070822_de5 /*de5-085 IM JAHR 1998 LEBTEN DORT TWO THOUSAND BÜRGER ralfherzog-20070822_de5 */de5-088 ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA TWO THOUSAND0 KM LANGEN ATLANTIKKÜSTE ralfherzog-20070826_de9 */de9-059 AM 21 SEPTEMBER TWO THOUSAND IST DAS PATENT ABGELAUFEN timiobaumann-20080418-ryd */de5-085 IM JAHR 1998 LEBTEN DORT TWO THOUSAND BÜRGER That is what I meant that it looked like a search and replace. Every occourence of the number 2000 seemed to be replaced by the englisch word for 2000(two thousands). Since 2000 is part of 20000 we got some strange prompt with TWO THOUSAND0. In case anyone wondered why I said it is better to take the word than the number. I encountered some serious problems while testing training with htk if the prompts contain numbers. Hope it helps Binh
Note: See
TracTickets for help on using
tickets.
German submission with prompt containing 20.000 gets replace with Two Thousand0 - should only occur with English submissions
See: Corpus::Quarantine::TestAudioExceptions? & Corpus::Quarantine::Submission::Prompts & Corpus::Quarantine::Submission::Prompts::en
need to replicate error...