wiki:TheoryAndAlgorithms

Version 3 (modified by DavidGelbart, 14 years ago) (diff)

--

Automatic Speech Recognition Theory and Algorithms

This page is about resources for learning more about the theory and algorithms behind automatic speech recognition (ASR) technology.

Online educational resources:

Here is a short list of books. This is not a complete list of popular or recommended books! The books I list here vary in focus so I've mentioned how to find the table of contents online. I also recommend checking out the reviews on amazon.com.

  • Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon. The table of contents can be viewed here.
  • Speech and Audio Signal Processing: Processing and Perception of Speech and Music by Ben Gold and Nelson Morgan. The table of contents can be viewed on amazon.com.
  • Speech Processing -- A Dynamic and Optimization-Oriented Approach by Li Deng and D. O'Shaughnessy. The table of contents can be viewed on amazon.com.
  • Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin. The table of contents (and long excerpts) can be viewed [href="www.cs.colorado.edu/~martin/slp.html here].

Gunnar Evermann's book recommendations can be found [htk.eng.cam.ac.uk/~ge204/refs.shtml here]. I'm not sure if that page will stay up now that he's leaving the HTK project, so here is a summary:

  • Pattern Classification by Duda, Hart, and Stork (this is one of my favorites too)</li>
  • Introduction to Statistical Pattern Recognition by Fukunaga
  • Automatische Spracherkennung -- Grundlagen, statistische Modelle und effiziente Algorithmen by Schukat-Talamazzini
  • The above-mentioned book by Xuang, Acero and Hon.
  • Automatic Speech Recognition -- The Development of the SPHINX Recognition System by Lee.
  • Statistical Methods for Speech Recognition by Jelinek.
  • Corpus-Based Methods in Language and Speech Processing, edited by Young and Bloothooft

Speech Technology Magazine is a good source for information about applications of speech technology.

If you want to know what techniques are currently attracting attention at the cutting edge of ASR research, papers that describe speech recognition systems that were built for official project benchmarks can be a good source of information. These systems tend to use a lot of different, carefully chosen techniques. Many of these papers have a year (the year of the benchmark) and the word "system" in the title, which makes them easy to find. For example, many such papers can be found with a Google Scholar search for:

intitle:2007 intitle:system speech recognition

Some of these papers use the word "recent" in the title instead of the year. So in that case the above search would change to

intitle:recent intitle:system speech recognition

NIST organizes many of these benchmarks and they have information on their web site.

Copyright 2008 David Gelbart - reprinted with permission