Version 7 (modified by DavidGelbart, 14 years ago) (diff) |
---|
Automatic Speech Recognition Theory and Algorithms
This page is about resources for learning more about the theory and algorithms behind automatic speech recognition (ASR) technology.
Online educational resources:
- Slides from ASR course taught at Stanford by Dan Jurafsky: here and here
- Slides from the ASR class taught at Columbia in 2005 by IBM staff
- Dan Ellis' slides from his Spring 2004 audio processing class
- Dan Ellis' slides from his Fall 2006 DSP class
- Slides from Jeff Bilmes' winter 2005 speech processing class
- Steve Renals' n-gram language modeling lecture from a computational linguistics class
- Joshua Goodman's publications. There is useful tutorial material in The State of the Art in Language Modeling and A Bit of Progress in Language Modeling
- Slides from Bryan Pellom's 2004 speech recognition class
- Slides from Automatic Speech Recognition, Spring 2003, MIT
- Hidden Markov Models for Speech Recognition, Spring 2004, OGI (PowerPoint ONLY)
- "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition": this classic article by Rabiner can be found with Google or Google Scholar
- "Automatic speech recognition: History, methods and challenges" is a 2008 article by Douglas O'Shaughnessy in the journal Pattern Recognition (volume 41, issue 10)
- Morgan and Bourlard intro to hybrid (Neural Net/HMM) speech recognition systems
- Eric Fosler's HMM Tutorial
- There is a nice little overview of the EM algorithm, with references, in the paper Convergence Results for the EM Approach to Mixtures of Experts Architectures by Jordan and Xu
- Jeff Bilmes' Tutorial on the EM algorithm.
- Essex phonetics articles
- Jason Eisner's Interactive Spreadsheet for Teaching the Forward-Backward Algorithm
- Yaroslav Bulatov's recommendations of online machine learning materials: here and here.
- Some intro material on dynamic programming: here and here (includes Diff.java which implements the diff command). If you are having trouble learning the Viterbi algorithm, it may help to start by learning dynamic programming and DTW -- see (e.g.) the Columbia/IBM slides for information about the connection.
- Matlab code to train a Gaussian mixture model with EM
- Adaptation techniques in ASR
- Article on sampling and digital filtering (This article is a nice complement to many DSP textbooks. The discussion of ringing seems well done. )
- The Scientist and Engineer's Guide to Digital Signal Processing free online DSP book
Book recommendations:
Some book recommendations from David Gelbart. (This is not a complete list of popular or recommended books. The books vary in focus so I've mentioned how to find the table of contents online. I also recommend checking out the reviews on amazon.com.)
- Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon. The table of contents can be viewed here.
- Speech and Audio Signal Processing: Processing and Perception of Speech and Music by Ben Gold and Nelson Morgan. The table of contents can be viewed on amazon.com.
- Speech Processing -- A Dynamic and Optimization-Oriented Approach by Li Deng and D. O'Shaughnessy. The table of contents can be viewed on amazon.com.
- Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin. The table of contents (and long excerpts) can be viewed [href="www.cs.colorado.edu/~martin/slp.html here].
- Pattern Classification by Duda, Hart, and Stork. This is about pattern recognition in general, not ASR in particular. The table of contents can be viewed here.
Gunnar Evermann's book recommendations (a summary of this page)
- Pattern Classification by Duda, Hart, and Stork
- Introduction to Statistical Pattern Recognition by Fukunaga
- Automatische Spracherkennung -- Grundlagen, statistische Modelle und effiziente Algorithmen by Schukat-Talamazzini
- The above-mentioned book by Xuang, Acero and Hon.
- Automatic Speech Recognition -- The Development of the SPHINX Recognition System by Lee.
- Statistical Methods for Speech Recognition by Jelinek.
- Corpus-Based Methods in Language and Speech Processing, edited by Young and Bloothooft
Applications of ASR:
Speech Technology Magazine is a good source for information about applications of speech technology. The magazine can be read free online. They also have a blog.
Current ASR research:
If you want to know what techniques are currently attracting attention at the cutting edge of ASR research, papers that describe speech recognition systems that were built for official project benchmarks can be a good source of information. These systems tend to use a lot of different, carefully chosen techniques. Many of these papers have a year (the year of the benchmark) and the word "system" in the title, which makes them easy to find. For example, many such papers can be found with a Google Scholar search for:
intitle:2007 intitle:system speech recognition
Some of these papers use the word "recent" in the title instead of the year. So in that case the above search would change to
intitle:recent intitle:system speech recognition
NIST organizes many of these benchmarks and they have information on their web site.