Version 66 (modified by kmaclean, 15 years ago) (diff) |
---|
Possible Audio Sources
(Here is a list of possible sources of Spoken Audio files that might be used for the creation of GPL Acoustic Models)
- Audio Source list:
- Gutenburg audio project
- Wikipedia Spoken Articles
- librivox
- Creative Commons Audio
- CMU:
- Festvox:
- Festvox databases
- CMU ARCTIC (no restrictions)
- CMU_FAF (Facts and Fables) database
- CMU_SIN database Speech in Noise
- CMU Chaplain (for research only)
- Diphone Databases
- ldom Databases (Limited Domain)
- Festvox databases
- ISIP Switchboard
- American Rhetoric
- MICASE
- TalkBank TalkBank Audio Files (GNU license)
- Hansard Canada (Audio feeds on day of debate)
- MICASE Michigan Corpus of Academic Spoken English
- AUE - alt-usage-english
- Voxeo Telephony Audio Files
- Internet Archive's audio books
- W3_Corpora
- SCOTS - Scottish Corpus of Texts and Speech
- NLTK - contains a sample of the TIMIT corpus
- OYEZ US Supreme Court Media (mp3)
- Speech separation challenge
- The ICSI Meeting Corpus - Berkeley
- open speech repository
Links
- Bookmarks for Corpus-based Linguists
- Corpora and other Language and Speech Data under DICE
- David Lee's Bookmarks for Corpus based Linguistics
- W3-Corpora List
- Spoken Word Project - spoken Word archives
- isca-student - International speech communication association
Other Possible sources, but with licensing issues:
- Buckeye Corpus
- CSLU Speech Synthesis Research Group;
- OGIresLPC 2.1.0 voices (voice data not released yet - only for research/personal use ...)
- CMU
- Let's Go Speech Dialog Data - (license for research only)
- Festvox
- CSTR US KED Timit (for research, educational and individual use only)
- Literal Systems - MP3/CC No derivative works
- loudlit - MP3/CC No derivative works
Articles
- Brought Turner's Communications Technology Blog
- Extreme Speech™ Recognition System
Corpora for Non-Commercial use
- IViE (non-commercial purposes)
- The Speech Accent Archive (Creative Commons - non-commercial)
- AMI Meeting Corpus - CC Attribution NonCommercial ShareAlike 2.5 Licence
- EUSTACE - non-commercial use
- MOCHATIMIT non-commercial use
(Also see Ticket #22)