wiki:TheoryAndAlgorithms

Version 14 (modified by DavidGelbart, 14 years ago) (diff)

--

Automatic Speech Recognition Theory and Algorithms

This page is about resources for learning more about the theory and algorithms behind automatic speech recognition (ASR) technology.

Online educational resources:

Book recommendations:

David Gelbart's book recommendations:

  • Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon. The table of contents can be viewed here.
  • Speech and Audio Signal Processing: Processing and Perception of Speech and Music by Ben Gold and Nelson Morgan. The table of contents can be viewed on amazon.com.
  • Speech Processing -- A Dynamic and Optimization-Oriented Approach by Li Deng and D. O'Shaughnessy. The table of contents can be viewed on amazon.com.
  • Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin. The table of contents (and long excerpts) can be viewed here.
  • Pattern Classification by Duda, Hart, and Stork. This is about pattern recognition in general, not ASR in particular. The table of contents can be viewed here.
  • A Course in Phonetics by Ladefoged. A good place to learn about phonemes (which are used in ASR pronunciation dictionaries), acoustic phonetics (which relates to the design of ASR feature extraction methods such as MFCC), and articulatory phonetics (which are often used in formulating decision tree rules for HMM state tying).

(The above is not a complete list of popular or recommended books. The books vary in focus so I've mentioned how to find the table of contents online. I also recommend checking out the reviews on amazon.com.)

Gunnar Evermann's book recommendations (a summary of this page):

  • Pattern Classification by Duda, Hart, and Stork
  • Introduction to Statistical Pattern Recognition by Fukunaga
  • Automatische Spracherkennung -- Grundlagen, statistische Modelle und effiziente Algorithmen by Schukat-Talamazzini
  • The above-mentioned book by Xuang, Acero and Hon.
  • Automatic Speech Recognition -- The Development of the SPHINX Recognition System by Lee.
  • Statistical Methods for Speech Recognition by Jelinek.
  • Corpus-Based Methods in Language and Speech Processing, edited by Young and Bloothooft

Applications of ASR:

Speech Technology Magazine is a good source for information about applications of speech technology. The magazine can be read free online. They also have a blog.

Current ASR research:

If you want to know what techniques are currently attracting attention at the cutting edge of ASR research, papers that describe speech recognition systems that were built for official project benchmarks can be a good source of information. These systems tend to use a lot of different, carefully chosen techniques. Many of these papers have a year (the year of the benchmark) and the word "system" in the title, which makes them easy to find. For example, many such papers can be found with a Google Scholar search for:

intitle:2007 intitle:system speech recognition

Some of these papers use the word "recent" in the title instead of the year. So in that case the above search would change to

intitle:recent intitle:system speech recognition

NIST organizes many of these benchmarks and they have information on their web site.