[TUHS] Archeology: AberMUD, BCPL, ec.
stewart at serissa.com
Fri Feb 1 05:45:10 AEST 2019
A followup from TV Raman, now at Google:
> We also did an intern project -- Tom's intern who became my intern after
> Tom left (Arjen De Vries) where we did:
> 1. Converted the caption stream into an sgml document indexed by time --
> so the caption stream came down in dribs and drabs of the form "turn
> background yellow, foreground white, place this text"... that turned
> into the SGML document, with each element tagged with time.
> 2. We then indexed that collection of SGML documents -- the content
> stream was Tom's ring-buffer of the CNN live feed (6 hours was what we
> stored from memory)
> 3. We then built a simple-minded search engine over the SGML documents,
> used the CRL reco engine for getting user queries -- you could also just
> type the query at a search box; did the search over the
> caption-doc-index, found the time-stamp and played the video.
> Arjen may have published some of this as his final year Masters project
> out of the University Of Twente -- likely summer 1995.
> Id: kg:/m/0285kf1
I searched for Arjen De Vries and found
“Radio and Television Information Filtering through Speech Recognition”
which in turn cites his Master’s thesis from 1995.
> On 2019, Jan 31, at 2:34 PM, Lawrence Stewart <stewart at serissa.com> wrote:
> I was at CRL from 1989 to 1994. I sent an inquiry to our informal mailing list.
> We had written an audio server along the lines of the X server (http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-93-8.pdf <http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-93-8.pdf>) and Tom Levergood wrote an application called Store24 to keep a rolling 24 history of WBUR (local NPR station). We thought about using speech recognition to build a searchable index for it.
> The next idea was to do the same thing for Video, perhaps using the closed captioning feed to develop the index. Dave Wecker (now at Microsoft Research) reports working on extracting data from NPR news streams and it would find the appropriate audio or video clip. He’s not sure he published that.
> Jim Gettys cites http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-99-2.pdf <http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-99-2.pdf> (Indexing Multimedia for the Internet) and notes that all the DEC techreports are hidden away at http://www.hpl.hp.com/techreports/ <http://www.hpl.hp.com/techreports/>. Choose “Browse by year” and select Compaq/DEC
>> On 2019, Jan 31, at 9:42 AM, Clem Cole <clemc at ccc.com <mailto:clemc at ccc.com>> wrote:
>> I'm not sure if the old DEC CRL tech reports are still around. At one time before the Compaq-tion, some folks at CRL and the folks at Boston Public Library and WGBH were working with video and trying to extract all sorts of text from it. I do not remember how successful they were, but there might be some hints in their tech reports. I'll ask around and see if I can turn anything up. Part of the problem I have is I that don't remember who was doing that work, but some of my friends might.
>> On Thu, Jan 31, 2019 at 2:16 AM Alec Muffett <alec.muffett at gmail.com <mailto:alec.muffett at gmail.com>> wrote:
>> Has anyone ever attempted to OCR a video, perhaps by breaking into frames and then aggregating the results, using multiple frames to correct each other?
>> On Wed, 30 Jan 2019, 19:51 Richard Salz <rich.salz at gmail.com <mailto:rich.salz at gmail.com> wrote:
>> Some folks are trying to figure out how to get AberMud source online and working; see https://twitter.com/larsbrinkhoff/status/1056823314272960512 <https://twitter.com/larsbrinkhoff/status/1056823314272960512>
>> Sample code at https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b <https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the TUHS