Version 11 (modified by kmaclean, 15 years ago) (diff) |
---|
Free Acoustic Models
- ATK distribution contains compiled HMM models! - not sure if the compiled version is usable with Julius.
- Keith Vertanen's free Acoustic Models (no source audio) using Wall Street Journal (WSJ) Corpus:
Acoustic Model Notes
Sphinx Acoustic Models were trained using 140 hours of 1996 and 1997 hub4 training data. VoxForge?'s goal for release 1.0 is to collect 140 hours of speech audio for the creation of Open Source Acoustic Models.
details from LDC site:
- 1996 English Broadcast News Speech (Hub-4) - 104 hours of broadcasts
- 1997 English Broadcast News Speech (Hub-4) - 97 hours of news broadcasts
Estimating Storage requirements:
- for 48kHz:16bit audio, 5 seconds of audio takes 500k.
- therefore about 6 meg per minute!
- if we want 140 hours of speech, we will need 50400 Meg or around 50.4Gig (assumes a 1000k per Meg), for Original data.
- Will likely need at least double that space with the propagation of audio (downsampling, noise reduction, etc.) through version control to create Acoustic Models - therefore need at least 100Gig of storage to meet our stated objective.
- VoxForge server currently holds 200 Gig, and, if needed, can easily add additional storage.
- Bandwidth is a greater issue, therefore we will require Peer-to-Peer sharing of audio files (i.e. Bittorrent) - see ticket #11.