Here is a list of possible sources of Spoken Audio files that might be used for the creation of GPL Acoustic Models. * Audio Source list: * [http://www.gutenberg.org/audio/ Gutenburg audio project ] * [http://en.wikipedia.org/wiki/Category:Spoken_articles Wikipedia Spoken Articles ] * CMU: * [http://www.speech.cs.cmu.edu/databases/ CMU Audio Database] : * [http://www.speech.cs.cmu.edu/databases/micarray Microphone array database ] * [http://www.speech.cs.cmu.edu/databases/an4 Census (AN4) database ] * [http://www.festvox.org/cmu_sin CMU_SIN Speech in Noise Database ] * [http://www.speech.cs.cmu.edu/databases/pda PDA database ] * [http://www.speech.cs.cmu.edu/databases/rm1 Resource Management (RM1) database ] (no wav - Sphinx mfc only) * Festvox: * [http://www.festvox.org/dbs/ Festvox databases ] * [http://www.festvox.org/cmu_arctic/ CMU ARCTIC] (no restrictions) * [http://www.festvox.org/cmu_faf CMU_FAF (Facts and Fables) database] * [http://www.festvox.org/cmu_sin CMU_SIN database] Speech in Noise * [http://www.speech.cs.cmu.edu/Tongues CMU Chaplain] (for research only) * Diphone Databases * [http://www.festvox.org/dbs/dbs_kal.html CMU US KAL diphone] * [http://www.festvox.org/dbs/dbs_rab.html CSTR UK RAB diphone] * ldom Databases (Limited Domain) * [http://www.festvox.org/dbs/dbs_time.html time ldom (cmu_time_awb_ldom)] * [http://www.festvox.org/dbs/dbs_weather.html weather ldom (cmu_weather_awb_ldom) ] * [http://www.festvox.org/dbs/dbs_com.html Communicator ldom (cmu_com_kal_ldom) ] * ISIP/CAVS Switchboard * [http://www.cavs.msstate.edu/hse/ies/projects/switchboard/releases/ ISIP Abridged Switchboard Audio Database] * [http://www.cavs.msstate.edu/hse/ies/projects/switchboard/releases/vrt/ full Switchboard Audio Database] * [http://www.americanrhetoric.com/ American Rhetoric] * [http://www.lsa.umich.edu/eli/micase/Audio/index.htm MICASE] * [http://www.talkbank.org/data/ TalkBank ] [http://www.talkbank.org/media/ TalkBank Audio Files ] (GNU license) * [http://www.talkbank.org/media/SWB/ Switchboard database ] * Hansard Canada (Audio feeds on day of debate) * [http://www.parl.gc.ca/common/Chamber_House_Debates.asp?Language=E&Parl=39&Ses=1 House of Commons] * [http://www.parl.gc.ca/common/Chamber_Senate_Debates.asp?Language=E&Parl=39&Ses=1 Senate] * [http://micase.umdl.umich.edu/m/micase/ MICASE Michigan Corpus of Academic Spoken English] * [http://alt-usage-english.org/audio_archive.shtml AUE - alt-usage-english ] * [http://evolution.voxeo.com/library/audio/prompts/home.jsp Voxeo Telephony Audio Files ] * [http://www.archive.org/details/audio Internet Archive's collection of audio recordings ] * Links * [http://devoted.to/corpora Bookmarks for Corpus-based Linguists] * [http://www.inf.ed.ac.uk/resources/corpora/ Corpora and other Language and Speech Data under DICE] * [http://personal.cityu.edu.hk/~davidlee/devotedtocorpora/corpora.htm David Lee's Bookmarks for Corpus based Linguistics] Other Possible sources, but with licensing issues: * [http://buckeyecorpus.osu.edu/ Buckeye Corpus] * CSLU Speech Synthesis Research Group; * [http://cslu.cse.ogi.edu/tts/download/index.html#plugin OGIresLPC 2.1.0 voices ] (voice data not released yet - only for research/personal use ...) * CMU * [http://www.speech.cs.cmu.edu/letsgo/letsgodata.html Let's Go Speech Dialog Data ] - (license for research only) * Festvox * [http://www.festvox.org/dbs/dbs_kdt.html CSTR US KED Timit] (for research, educational and individual use only) (Also see Ticket #22)