wiki:AcousticModelNotes
Last modified 11 years ago Last modified on 05/27/08 15:24:19

Notes on Acoustic Model Creation/Conversion?

Training recipes

Training Tutorials

Acoustic Model Notes

Sphinx Acoustic Models were trained using 140 hours of 1996 and 1997 hub4 training data. VoxForge's goal for release 1.0 is to collect 140 hours of speech audio for the creation of Open Source Acoustic Models.

details from LDC site:

Estimating Storage requirements for VoxForge Corpora and Acoustic Models:

  • for 48kHz:16bit audio, 5 seconds of audio takes 500k.
  • therefore about 6 meg per minute!
  • if we want 140 hours of speech, we will need 50400 Meg or around 50.4Gig (assumes a 1000k per Meg), for Original data.
  • Will likely need at least double that space with the propagation of audio (downsampling, noise reduction, etc.) through version control to create Acoustic Models - therefore need at least 100Gig of storage to meet our stated objective.
  • VoxForge server currently holds 200 Gig, and, if needed, can easily add additional storage.
  • Bandwidth is a greater issue, therefore we will require Peer-to-Peer sharing of audio files (i.e. Bittorrent) - see ticket #11.

Other Languages