At 04:50 PM 1/9/2008, you wrote:
>what are your ideas on marking these words in a way that
>maintain all their possible reads ?
> * s/he [she or he]
> * media/tion media(tion) media-tion [media and mediation]
> * car(s) [car or cars]
If I understand your question correctly, these examples illustrate a
phenomenon which, while it is not at all unusual in text, defies
simple handling in markup. That is because this sort of thing goes
beyond the simple application of text as a sequence of characters
that together, according to a set of fairly simple rules (such as, in
modern European literacies, simple serial assembly of characters from
left to right), encode representations of some other set of entities
(whether you take that other layer to be words, sentences, utterances
or what have you). They "escape" the base rules, that is, and invoke
other rules, which usually have to be inferred by a human reader in
order to be properly construed.
Another way of putting it is that this kind of thing steps across a
line between text as (simple) text, and text as (complex) notation,
or indeed, text as markup of text. Of course, scholars of text can
also remind us that there is no such line, or at any rate not
"naturally": to the extent there is a line there, it is only by
virtue of regularities imposed from without or observable across a wide domain.
The usual way of handling this sort of thing in the current practice
of markup is to expand the notation explicitly into what it
represents, using markup to note both the expanded or implicit form,
and the form as instantiated. How that particular expansion is made
will depend on the details of the notation, both its form and its
(presumed) purpose and intention. TEI offers plenty of examples of
this sort of thing in its handling of abbreviations, regularized forms, etc.
>(i don't know how to call this kind of phenomena .. overlapping text ?)
I wouldn't call it overlapping text as such, although such notations,
the marking up of such notations, and/or the representation of
notational conventions in the form of markup, will frequently involve
overlap -- which is one reason why expanding it into a more prolix
form is usually the path of least resistance in markup regimens that
are good at hierarchy and organization, but not so good at overlap.
When it is really extensive or very artful, this kind of thing
becomes a certain sort of poetry working at the character level, such
as the calligrammes of Apollinaire, which are similarly intractable.
But then, those weren't designed for machine processing, so why should they be.
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML