Changes between Version 2 and Version 3 of DialogManagers

06/20/07 10:27:11 (15 years ago)



  • DialogManagers

    v2 v3  
    1010  * [Synergy SCXML Web Laboratory]  
     13== From a [ post from Will Walker] on kde-accessibility: == 
     17List:       kde-accessibility 
     18Subject:    Re: [Kde-accessibility] Fwd: Re: paraphlegic KDE support 
     19From:       Willie Walker <William.Walker () Sun ! COM> 
     20Date:       2006-02-23 16:57:34 
     21Message-ID: 6072A454-C87C-4612-AB8E-648FB3CA746B () sun ! com 
     22[Download message RAW] 
     24Hi All: 
     26I just want to jump in on the speech recognition stuff.  Having   
     27participated in several standards efforts (e.g., JSPAI, VoiceXML/SSML/  
     28SGML) in this area, and having developed a number of speech   
     29recognition applications, and having seen the trials and tribulations   
     30of inconsistent SAPI implementations, and having led the Sphinx-4   
     31effort, I'd like to offer my unsolicited opinion :-). 
     33In my opinion, there are enough differences in the various speech   
     34recognition systems and their APIs that I'm not sure efforts are best   
     35spent charging at the "one API for all" windmill.  IMO, one could   
     36spend years trying to come up with yet another standard but not very   
     37useful API in this space.  All we'd have in the end would be yet   
     38another standard but not very useful API with perhaps one buggy   
     39implementation on one speech engine.  Plus, it would just be   
     40repeating work and making the same mistakes that have already been   
     41done time and time again. 
     43As an alternative, I'd offer the approach of centering an available   
     44recognition engine and designing the assistive technology first.  Get   
     45your feet wet with that and use it as a vehicle to better understand   
     46the problems you will face with any speech recognition task for the   
     47desktop.  Examples include: 
     49o how to dynamically build a grammar based upon stuff you can get   
     50from the AT-SPI 
     51o how to deal with confusable words (or discover that recognition for   
     52a particular grammar is just plain failing and you need to tweak it   
     54o how to deal with unspeakable words 
     55o how to deal with deictic references 
     56o how to deal with compound utterances 
     57o how to handle dictation vs. command and control 
     58o how to deal with tapering/restructuring of prompts based upon   
     59recognition success/failure 
     60o how to allow the user to recover from misrecognitions 
     61o how to handle custom profiles per user 
     62o (MOST IMPORTANTLY) just what is a compelling speech interaction   
     63experience for the desktop? 
     65Once you have a better understanding of the real problems and have   
     66developed a working assistive technology, then take a look at perhaps   
     67genericizing a useful layer to multiple engines.  The end result is   
     68that you will probably end up with a useful assistive technology   
     69sooner.  In addition, you will also end up with an API that is known   
     70to work for at least one assistive technology.