Hi. Lou Burnard told me about this list.
I am a freelance writer (for the moment) who has been interested in
television captioning and other captioning issues for 18 years. I've
written a dozen articles on the topic, I've given a presentation here
and there, and I run a mailing list on media-access topics
([log in to unmask]).
I know a *very* little bit about SGML and believe that the world needs
DTDs and other standards for four access technologies: captioning, audio
description, subtitling, and dubbing. Some background information
o First, definitions:
Captioning: Rendering dialogue and other sounds in written words. Sign
language has NOTHING TO DO with captioning.
Closed-captioning: Captions transmitted in the form of a code. You need
a decoder (or, more likely, just a decoder chip) to turn the captions
into visible words. Nearly all North American TVs carry decoder chips as
standard equipment now.
Open-captioning: Captions that are an indelible part of the picture and
are always visible. (Open-captioning effectively does not exist.)
[Note: Captioning and subtitling have as little in common as bicycles
and motorcycles. Three big differences are: Captions are in the same
language as the audio (with relatively rare exceptions), denote
meaningful sound effects, and move to indicate the position of the
speaker. Subtitles are a translation, ignore sound effects, and are
always located in the same spot on-screen.]
Audio description: Rendering visual details in a spoken narrative. In
audio description, a special narrator succinctly describes action,
settings, facial expressions, onscreen graphics, clothing, and other
visual details. The narrator speaks out loud; A.D. is an auditory
medium, not a visual one. Narrators typically speak during pauses in
dialogue or at other appropriate moments, but sometimes they narrate
over dialogue, over music, and so on.
How does this relate to information technology and SGML? Some factoids
* TV closed-captioning of prerecorded programs in North America is done
using any of several rather primitive DOS programs. Real-time captioning
of live programs uses the same software and hardware with the addition
of a very skilled court reporter who enters dialogue into a stenotype
machine (along with other annotations necessary to captioning). Those
entries are in shorthand and are then translated into actual words via
lookup tables. (This means that homonyms like "four," "for," "fore,"
"IV," and "4" require distinct keystrokes. It's not exactly easy keeping
track of all those keystrokes.) The words are then spit out for display
on a decoder-equipped TV.
* Closed-captioning in North America is encoded on Line 21 of the
vertical blanking interval. The VBI is a narrow band of
normally-invisible picture lines between the bottom and the top of the
TV picture. (That's not a totally accurate description, but if you have
a TV with a vertical-hold control, you can set the picture rolling
slowly and see the VBI as a mostly-black bar between the top and bottom
of the picture.) North American TV signals are made up of 525 lines
(again, not totally accurate); the top 21.5 lines are in the VBI and are
ordinarily invisible. (They're not magic. They're perfectly visible if
you look for them. It's just that TV sets are adjusted to keep the VBI
out of sight.) Captions are encoded on line number 21 of those 21.5
lines. The caption codes are relatively wide rectangles of light that
flit back and forth. VCRs have no trouble recording and playing those
* CC in PAL-standard countries like most of Europe and Australia comes
about as an offshoot of the World System Teletext technology. You just
tune to a certain page of teletext (888, usually) and you suddenly see
captions on any captioned show. This technology uses several lines of
the VBI; all the encoding takes the form of tiny dots in the VBI which
are too small for anything but Super-VHS VCRs to record. This is a
severe limitation, but there are some provisos to it.
* Typography in both the Line 21 and WST systems is crap. Megacrap,
* Captioning is a huge industry. Effectively all prime-time shows on all
networks, everything remotely resembling a newscast, many daytime shows,
thousands of home videos, most national commercials, lots of music
videos, training tapes, and more are captioned. This is a source of
money *and* a source of intellectual property. Think about it. But the
tools being used for captioning are very primitive.
* Audio description on TV is relatively rare. PBS is the biggest source
of A.D.; described programs carry a mix of descriptions + main audio in
the Second Audio Program subchannel of stereo TV. (If you have a stereo
TV-- most midrange to high-end models are stereo-- you can set your TV
to SAP. Won't do you much good, though, for everyday TV-- only a few
stations broadcast in stereo and virtually none use SAP.) The
descriptions, then, are "closed": You needn't be bothered with them
unless you want to be. Unfortunately, while all TV signals have a VBI,
not all have SAP, so A.D. is not a ubiquitous medium the way CC is.
* WGBH, the Boston PBS Überstation. is a dynamo in access
technology. It is home to the Caption Center (oldest captioner on earth,
and the best, though their standards are slipping), the Descriptive
Video Service (does A.D. for PBS and other clients, and also sells a
small home-video line of movies with always-audible descriptions), and
the National Center for Accessible Media (researches new technologies,
like Web captioning and captioning in movie houses). I know many people
there and actually get along with some of them. www.wgbh.org. Even these
people aren't really thinking all that broadly about the potential of
access technologies, though again that has many provisos.
* To caption a prerecorded program, you transcribe it. Usually the
captions are an edited version of that transcript-- reading is slower
than speaking, and there are speed limits to caption transmission-- but
if you retained a verbatim transcript with all proper annotations of
sound effects (phone ringing, thunder, etc.) and speaker identification,
suddenly you have a viable text-only analogue of an audiovisual program.
* It gets better: Audio description typically happens during pauses in
dialogue. A.D. scripts, then, are quite short-- up to 100 or 200 bursts
of narration. However, it's possible to describe *a whole program*
nonstop, and in fact one project I'm working on will do just that. If
you unite either or both of these A.D. scripts with the CC script,
suddenly you have a rich and complete text-only approximation of an
* What can you do with that information? Archive it, either on the Web
or your own computer or elsewhere. Monitor it continuously for keywords.
(It is believed that the NSA has done exactly that for years.) Use it
for people who don't want to wait 20 minutes to download a choppy
videoclip from a Web site. And, of course, use it for its intended
Where research is needed:
* SGML. Markups for everything from italics (which have reserved
functions in captioning along with all the regular uses of italics in
print) to speaker IDs to caption-on and -off times to various
annotations for A.D. tracks are needed. How is this useful? Really
sophisticated captioning/A.D. software could be developed. More
relevantly, existing nonlinear video-editing systems a la Avid and
programs like Premiere and Acrobat could be extended to understand
SGMLified access codes. This same development process would have to
encompass subtitling and dubbing, too, which I am not talking a whole
lot about here.
Also, if captions were stored as part of an SGML structure, they
could be automatically reformatted in real time for different display
devices, like an LED screen (with a character set different from TV), TV
pop-up captions, TV scroll-up captions, a continuous text-only stream
without paragraph and sentence breaks for computers, or an offscreen
large-print display for visually-impaired viewers.
Or captions created with one software package could be read and
understood by another-- or another country's system. Right now it is
quite tedious to reformat Line 21 CC for PAL CC, and there are various
typographic issues that come up here.
* Web access. Trying to educate Webmasters that the WWW is not an excuse
to post pretty pictures is a battle we've already lost. But making those
graphics accessible *is* possible. Same with audioclips and videoclips.
* Subtitling and dubbing are the norm outside English-speaking
countries. Both are possible in the same movie; it is then possible to
caption subtitled and/or dubbed movies.
So: I am interested in setting up a working group to create DTDs for
ONLY the four access technologies I mentioned. Softquad isn't
interested. Is anyone else?
[log in to unmask]