Version 6 (modified by DavidGelbart, 14 years ago) (diff)


Automatic Speech Recognition Theory and Algorithms

This page is about resources for learning more about the theory and algorithms behind automatic speech recognition (ASR) technology.

Online educational resources:

Book recommendations:

Some book recommendations from David Gelbart. (This is not a complete list of popular or recommended books. The books vary in focus so I've mentioned how to find the table of contents online. I also recommend checking out the reviews on

  • Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon. The table of contents can be viewed here.
  • Speech and Audio Signal Processing: Processing and Perception of Speech and Music by Ben Gold and Nelson Morgan. The table of contents can be viewed on
  • Speech Processing -- A Dynamic and Optimization-Oriented Approach by Li Deng and D. O'Shaughnessy. The table of contents can be viewed on
  • Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin. The table of contents (and long excerpts) can be viewed [href=" here].
  • Pattern Classification by Duda, Hart, and Stork (this is about pattern recognition in general, not ASR in particular)

Gunnar Evermann's book recommendations (a summary of [ this])

  • Pattern Classification by Duda, Hart, and Stork
  • Introduction to Statistical Pattern Recognition by Fukunaga
  • Automatische Spracherkennung -- Grundlagen, statistische Modelle und effiziente Algorithmen by Schukat-Talamazzini
  • The above-mentioned book by Xuang, Acero and Hon.
  • Automatic Speech Recognition -- The Development of the SPHINX Recognition System by Lee.
  • Statistical Methods for Speech Recognition by Jelinek.
  • Corpus-Based Methods in Language and Speech Processing, edited by Young and Bloothooft

Applications of ASR:

Speech Technology Magazine is a good source for information about applications of speech technology. The magazine can be read free online. They also have a blog.

Current ASR research:

If you want to know what techniques are currently attracting attention at the cutting edge of ASR research, papers that describe speech recognition systems that were built for official project benchmarks can be a good source of information. These systems tend to use a lot of different, carefully chosen techniques. Many of these papers have a year (the year of the benchmark) and the word "system" in the title, which makes them easy to find. For example, many such papers can be found with a Google Scholar search for:

intitle:2007 intitle:system speech recognition

Some of these papers use the word "recent" in the title instead of the year. So in that case the above search would change to

intitle:recent intitle:system speech recognition

NIST organizes many of these benchmarks and they have information on their web site.