[Twisted-web] Fwd: Re: [Ossri] data collection
general at eepatents.com
Wed Feb 23 13:41:10 MST 2005
Thought this might be of interest as a use case.
Registered Patent Agent
Open-Source Software Author (yes, both...)
Web Site: http://www.eepatents.com
---------- Forwarded Message ----------
Subject: Re: [Ossri] data collection (WAS: dual-thread processors)
Date: Wednesday 23 February 2005 12:36 pm
From: Ed Suominen <general at eepatents.com>
To: ossri at harvee.org
Joe, that is a great idea!
I could run the site, as my server is Python-based (using the
Twisted/Nevow packages) and it very flexible. I've already got
something running that involves registered users logging in and
uploading and downloading binary files (PDFs of documents) to and from
a MySQL database. Tying the binaries to user IDs and transcripted text
would be very straightforward. I couldn't do much work in it for
another month or so, though.
On Wednesday 23 February 2005 12:25 pm, Joe Phillips wrote:
> On Wed, 2005-02-23 at 15:05 -0500, Willie Walker wrote:
> > To make Sphinx-4 really viable for
> > desktop dictation, we'll need to get some better data than
> > can be obtained from the Linguistic Data Consortium (LDC).
> > This, of course, could be something where the open source
> > community can really shine - for the acoustic models, we just
> > need hours and hours of transcribed audio (e.g., "a1.wav
> > contains the words 'now is the time for all good people to
> > come to the aid of sphinx'")
> I think the opensource community would be great for data generation.
> Can/will anyone setup a site to expedite data collection? Perhaps
> this is where OSSRI can contribute? Say... upload a WAV, upload a
> transcription and include some sort of web of trust where a
> transcription isn't official until some other users approve it.
> The website should at least document transcription style and format
> for those of us who would like to contribute but cannot in some more
> technical way.
> I for one would be happy to contribute recordings and transcriptions
> of myself if I knew the data would be put to good use in sphinx.
> Willie, can you expand on what would be most useful to improving
Ossri mailing list
Ossri at harvee.org
More information about the Twisted-web