Version 42 (modified by kmaclean, 15 years ago) (diff) |
---|
Here is a list of possible sources of Spoken Audio files that might be used for the creation of GPL Acoustic Models.
- Audio Source list:
- Gutenburg audio project
- Wikipedia Spoken Articles
- librivox
- CMU:
- Festvox:
- Festvox databases
- CMU ARCTIC (no restrictions)
- CMU_FAF (Facts and Fables) database
- CMU_SIN database Speech in Noise
- CMU Chaplain (for research only)
- Diphone Databases
- ldom Databases (Limited Domain)
- Festvox databases
- ISIP Switchboard
- American Rhetoric
- MICASE
- TalkBank TalkBank Audio Files (GNU license)
- Switchboard database
- [http://talkbank.org/data/Conversation/ Santa Barbara Corpus of Spoken American English
- Hansard Canada (Audio feeds on day of debate)
- MICASE Michigan Corpus of Academic Spoken English
- AUE - alt-usage-english
- Voxeo Telephony Audio Files
- Internet Archive's audio books
- W3_Corpora
- SCOTS - Scottish Corpus of Texts and Speech
- NLTK - contains a sample of the TIMIT corpus
- Links
Other Possible sources, but with licensing issues:
- Buckeye Corpus
- CSLU Speech Synthesis Research Group;
- OGIresLPC 2.1.0 voices (voice data not released yet - only for research/personal use ...)
- CMU
- Let's Go Speech Dialog Data - (license for research only)
- Festvox
- CSTR US KED Timit (for research, educational and individual use only)
(Also see Ticket #22)