Ticket #127 (new enhancement)
Metrics
Reported by: | kmaclean | Owned by: | kmaclean |
---|---|---|---|
Priority: | major | Milestone: | WebSite 0.2.2 |
Component: | Web Site | Version: | 0.1-alpha |
Keywords: | Cc: |
Description
From: jaiger
As part of your nightly job, perhaps you can also collect metrics on the archive such as:
- total audio data, in seconds and MB
- compare the total with some goal: we're 10% of our 100hour goal
- similar audio submission metrics by user: jaiger submitted 1 hour of audio or 1% of 100hour goal
Collecting and publishing the metrics might spur submissions for those of us with competitive personalities and at least show us where we are relative to our project goals.
For future programming ease you might also (at submission time) create an XML file containing the License, prompts and README data. The XML file might also contain other data such as the calculated time metrics as above or perhaps MD5 hashes of audio files for use to check that a file as downloaded is not corrupt. This might facilitate future scripts manipulating the data - say for import into a DB or other queries.
Change History
comment:3 Changed 15 years ago by kmaclean
Currently, we get users to add one second of silence before and after each utterance recording - this will make it seem like we have more speech data than we really have, since the metrics script reads the wav file headers of each file.
Might just mean we need to subtract 1 or 1.5 seconds for each prompt line (i.e. 40 prompts means remove 40 to 60 seconds) from the total time for each user submission.