Version 6 (modified by kmaclean, 15 years ago) (diff)


Dialog Managers

Dialog Manager - information

  • Speech Interface Guidelines - overview of speech interface design principles as applied to the range of applications that have been developed at Carnegie Mellon

State Chart XML (SCXML)

From a post from Will Walker on kde-accessibility:

List:       kde-accessibility
Subject:    Re: [Kde-accessibility] Fwd: Re: paraphlegic KDE support
From:       Willie Walker <William.Walker () Sun ! COM>
Date:       2006-02-23 16:57:34
Message-ID: 6072A454-C87C-4612-AB8E-648FB3CA746B () sun ! com
[Download message RAW]

Hi All:

I just want to jump in on the speech recognition stuff.  Having  
participated in several standards efforts (e.g., JSPAI, VoiceXML/SSML/ 
SGML) in this area, and having developed a number of speech  
recognition applications, and having seen the trials and tribulations  
of inconsistent SAPI implementations, and having led the Sphinx-4  
effort, I'd like to offer my unsolicited opinion :-).

In my opinion, there are enough differences in the various speech  
recognition systems and their APIs that I'm not sure efforts are best  
spent charging at the "one API for all" windmill.  IMO, one could  
spend years trying to come up with yet another standard but not very  
useful API in this space.  All we'd have in the end would be yet  
another standard but not very useful API with perhaps one buggy  
implementation on one speech engine.  Plus, it would just be  
repeating work and making the same mistakes that have already been  
done time and time again.

As an alternative, I'd offer the approach of centering an available  
recognition engine and designing the assistive technology first.  Get  
your feet wet with that and use it as a vehicle to better understand  
the problems you will face with any speech recognition task for the  
desktop.  Examples include:

o how to dynamically build a grammar based upon stuff you can get  
from the AT-SPI
o how to deal with confusable words (or discover that recognition for  
a particular grammar is just plain failing and you need to tweak it  
o how to deal with unspeakable words
o how to deal with deictic references
o how to deal with compound utterances
o how to handle dictation vs. command and control
o how to deal with tapering/restructuring of prompts based upon  
recognition success/failure
o how to allow the user to recover from misrecognitions
o how to handle custom profiles per user
o (MOST IMPORTANTLY) just what is a compelling speech interaction  
experience for the desktop?

Once you have a better understanding of the real problems and have  
developed a working assistive technology, then take a look at perhaps  
genericizing a useful layer to multiple engines.  The end result is  
that you will probably end up with a useful assistive technology  
sooner.  In addition, you will also end up with an API that is known  
to work for at least one assistive technology.