Ticket #127 (new enhancement)

Opened 16 years ago

Last modified 15 years ago


Reported by: kmaclean Owned by: kmaclean
Priority: major Milestone: WebSite 0.2.2
Component: Web Site Version: 0.1-alpha
Keywords: Cc:


From: jaiger

As part of your nightly job, perhaps you can also collect metrics on the archive such as:

  • total audio data, in seconds and MB
  • compare the total with some goal: we're 10% of our 100hour goal
  • similar audio submission metrics by user: jaiger submitted 1 hour of audio or 1% of 100hour goal

Collecting and publishing the metrics might spur submissions for those of us with competitive personalities and at least show us where we are relative to our project goals.

For future programming ease you might also (at submission time) create an XML file containing the License, prompts and README data. The XML file might also contain other data such as the calculated time metrics as above or perhaps MD5 hashes of audio files for use to check that a file as downloaded is not corrupt. This might facilitate future scripts manipulating the data - say for import into a DB or other queries.

Change History

comment:1 Changed 16 years ago by kmaclean

  • Type changed from defect to enhancement

comment:2 Changed 16 years ago by kmaclean

Alpha version of script posted in prod

comment:3 Changed 16 years ago by kmaclean

Currently, we get users to add one second of silence before and after each utterance recording - this will make it seem like we have more speech data than we really have, since the metrics script reads the wav file headers of each file.

Might just mean we need to subtract 1 or 1.5 seconds for each prompt line (i.e. 40 prompts means remove 40 to 60 seconds) from the total time for each user submission.

comment:4 Changed 15 years ago by kmaclean

  • Milestone changed from Website 0.2 to WebSite 0.2.1

comment:5 Changed 15 years ago by kmaclean

  • Milestone changed from WebSite 0.2.1 to WebSite 0.2.2
Note: See TracTickets for help on using tickets.