== Notes on Acoustic Model Creation/Conversion == * [http://sourceforge.net/projects/srmc/ universal speech recognition model converter (USMC)] - some CVS entries, no files == Training recipes == * [http://www.inference.phy.cam.ac.uk/kv227/sphinx/ Keith Vertanen's CMU Sphinx Wall Street Journal (WSJ) Training Recipe] * [http://www.inference.phy.cam.ac.uk/kv227/htk/ Keith Vertanen's HTK Wall Street Journal (WSJ) Training Recipe] == Acoustic Model Notes == [http://www.speech.cs.cmu.edu/sphinx/models/hub4opensrc_jan2002/INFO_ABOUT_MODELS Sphinx Acoustic Models ] were trained using 140 hours of 1996 and 1997 hub4 training data. VoxForge's goal for release 1.0 is to collect 140 hours of speech audio for the creation of Open Source Acoustic Models. details from LDC site: * [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC97S44 1996 English Broadcast News Speech (Hub-4)] - 104 hours of broadcasts * [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98S71 1997 English Broadcast News Speech (Hub-4)] - 97 hours of news broadcasts == Estimating Storage requirements for VoxForge Corpora and Acoustic Models: == * for 48kHz:16bit audio, 5 seconds of audio takes 500k. * therefore about 6 meg per minute! * if we want 140 hours of speech, we will need 50400 Meg or around 50.4Gig (assumes a 1000k per Meg), for Original data. * Will likely need at least double that space with the propagation of audio (downsampling, noise reduction, etc.) through version control to create Acoustic Models - therefore need at least '''100Gig''' of storage to meet our stated objective. * !VoxForge server currently holds 200 Gig, and, if needed, can easily add additional storage. * Bandwidth is a greater issue, therefore we will require Peer-to-Peer sharing of audio files (i.e. Bittorrent) - see ticket #11.