== Possible Audio Sources == (Here is a list of possible sources of Spoken Audio files that might be used for the creation of GPL Acoustic Models) * Audio Source list: * [http://www.gutenberg.org/audio/ Gutenburg audio project ] * [http://en.wikipedia.org/wiki/Category:Spoken_articles Wikipedia Spoken Articles ] * [http://librivox.org/ librivox] * [http://creativecommons.org/audio/ Creative Commons Audio ] * CMU: * [http://www.speech.cs.cmu.edu/databases/ CMU Audio Database] : * [http://www.speech.cs.cmu.edu/databases/micarray Microphone array database ] * [http://www.speech.cs.cmu.edu/databases/an4 Census (AN4) database ] * [http://www.festvox.org/cmu_sin CMU_SIN Speech in Noise Database ] * [http://www.speech.cs.cmu.edu/databases/pda PDA database ] * [http://www.speech.cs.cmu.edu/databases/rm1 Resource Management (RM1) database ] (no wav - Sphinx mfc only) * Festvox: * [http://www.festvox.org/dbs/ Festvox databases ] * [http://www.festvox.org/cmu_arctic/ CMU ARCTIC] (no restrictions) * [http://www.festvox.org/cmu_faf CMU_FAF (Facts and Fables) database] * [http://www.festvox.org/cmu_sin CMU_SIN database] Speech in Noise * [http://www.speech.cs.cmu.edu/Tongues CMU Chaplain] (for research only) * Diphone Databases * [http://www.festvox.org/dbs/dbs_kal.html CMU US KAL diphone] * [http://www.festvox.org/dbs/dbs_rab.html CSTR UK RAB diphone] * ldom Databases (Limited Domain) * [http://www.festvox.org/dbs/dbs_time.html time ldom (cmu_time_awb_ldom)] * [http://www.festvox.org/dbs/dbs_weather.html weather ldom (cmu_weather_awb_ldom) ] * [http://www.festvox.org/dbs/dbs_com.html Communicator ldom (cmu_com_kal_ldom) ] * ISIP Switchboard * old links: * [http://www.cavs.msstate.edu/hse/ies/projects/switchboard/releases/ ISIP Abridged Switchboard Audio Database] * [http://www.cavs.msstate.edu/hse/ies/projects/switchboard/releases/vrt/ full Switchboard Audio Database] * [http://www.ece.msstate.edu/research/isip/projects/switchboard/ New link to ISIP Switchboard ] * [http://www.americanrhetoric.com/ American Rhetoric] * [http://www.lsa.umich.edu/eli/micase/Audio/index.htm MICASE] * [http://www.talkbank.org/data/ TalkBank ] [http://www.talkbank.org/media/ TalkBank Audio Files ] (GNU license) * [http://www.talkbank.org/media/SWB/ Switchboard database ] * [http://talkbank.org/data/Conversation/ Santa Barbara Corpus of Spoken American English ] * Hansard Canada (Audio feeds on day of debate) * [http://www.parl.gc.ca/common/Chamber_House_Debates.asp?Language=E&Parl=39&Ses=1 House of Commons] * [http://www.parl.gc.ca/common/Chamber_Senate_Debates.asp?Language=E&Parl=39&Ses=1 Senate] * [http://micase.umdl.umich.edu/m/micase/ MICASE Michigan Corpus of Academic Spoken English] * [http://alt-usage-english.org/audio_archive.shtml AUE - alt-usage-english ] * [http://evolution.voxeo.com/library/audio/prompts/home.jsp Voxeo Telephony Audio Files ] * [http://www.archive.org/details/audio_bookspoetry Internet Archive's audio books] * [http://www.essex.ac.uk/linguistics/clmt/w3c/corpus_ling/content/corpora/list/index2.html W3_Corpora] * [http://www.scottishcorpus.ac.uk/ SCOTS - Scottish Corpus of Texts and Speech ] * [http://nltk.sourceforge.net/ NLTK ] - contains a sample of the TIMIT corpus * [http://www.oyez.org/ OYEZ] US Supreme Court Media (mp3) * [http://www.dcs.shef.ac.uk/~martin/SpeechSeparationChallenge.htm Speech separation challenge] * [http://www.icsi.berkeley.edu/Speech/mr/mrdigits.html The ICSI Meeting Corpus] - Berkeley == Links == * [http://devoted.to/corpora Bookmarks for Corpus-based Linguists] * [http://www.inf.ed.ac.uk/resources/corpora/ Corpora and other Language and Speech Data under DICE] * [http://personal.cityu.edu.hk/~davidlee/devotedtocorpora/corpora.htm David Lee's Bookmarks for Corpus based Linguistics] * [http://www.essex.ac.uk/linguistics/clmt/w3c/corpus_ling/content/corpora/list/index2.html W3-Corpora List] * [http://www.historicalvoices.org/spokenword/resources/sound.php Spoken Word Project ] - spoken Word archives * [http://www.isca-students.org/corpora isca-student ] - International speech communication association == Other Possible sources, but with licensing issues: == * [http://buckeyecorpus.osu.edu/ Buckeye Corpus] * CSLU Speech Synthesis Research Group; * [http://cslu.cse.ogi.edu/tts/download/index.html#plugin OGIresLPC 2.1.0 voices ] (voice data not released yet - only for research/personal use ...) * CMU * [http://www.speech.cs.cmu.edu/letsgo/letsgodata.html Let's Go Speech Dialog Data ] - (license for research only) * Festvox * [http://www.festvox.org/dbs/dbs_kdt.html CSTR US KED Timit] (for research, educational and individual use only) * [http://literalsystems.org/abooks/index.php Literal Systems ] - MP3/CC No derivative works * [http://loudlit.org/ loudlit ] - MP3/CC No derivative works == Articles == * Brought Turner's Communications Technology Blog * [http://blogs.nmss.com/communications/2005/08/large_speech_co.html using Blogs for speech corpora] * [http://blogs.nmss.com/communications/2005/08/large_speech_co_1.html Large Speech Corpora - Dan Bricklin's Insights ] * [http://cislt.org/ExtremeSpeechRecognitionSystem.htm Extreme Speechâ„¢ Recognition System] == Corpora for Non-Commercial use == * [http://www.phon.ox.ac.uk/IViE/download1.html IViE] (non-commercial purposes) * [http://accent.gmu.edu/index.php The Speech Accent Archive ] (Creative Commons - non-commercial) * [http://www.idiap.ch/amicorpus AMI Meeting Corpus] - CC Attribution !NonCommercial !ShareAlike 2.5 Licence * [http://www.cstr.ed.ac.uk/projects/eustace/ EUSTACE] - non-commercial use * [http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html MOCHATIMIT] non-commercial use (Also see Ticket #22)