LISTSERV mailing list manager LISTSERV 16.5

Help for TEI-L Archives


TEI-L Archives

TEI-L Archives


TEI-L@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TEI-L Home

TEI-L Home

TEI-L  October 2004

TEI-L October 2004

Subject:

MS Word and XML (eventually...)

From:

Michael Beddow <[log in to unmask]>

Reply-To:

Michael Beddow <[log in to unmask]>

Date:

Sat, 9 Oct 2004 14:04:06 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (158 lines)

WARNING: This, despite its deeply serious core (honestly, you'll get there
if you stick with me) is packaged up as a strictly Fridays-only craving your
indulgence for even more verbosity and frivolity than usual sort of posting,
of the kind pioneered over on XSL-L, when, as the weekend drew near,
Sebastian would sometimes replace tombstone inscriptions by richly-annotated
varieties of English Ale in his transformation examples, and Wendell would
tackle explaining, for the umpteenth time, the evils of
disable-output-escaping to the stubbornly benighted by using the analogy of
smuggling beer cans past the bouncers at a ball game, an example to which I
once alluded rather too obscurely in a posting on this list, earning me a
well-deserved but pretty scarey off-list wigging from Lou (some of you will
know all too well the sort of thing I mean by the latter).

In other words, if it's still Friday where you are when you read this, and
your week's encoding quota is still unfilled, you Don't Want to Read This
Right Now. Maybe later, when you've poured that first Gin and Tonic, or
whatever else ritually starts your weekend.

What triggered this is a little obiter dictum from Syd under the "Creating a
TEI Bibliography - what DTD to use?" thead which I am deliberately not
continuing here so as not to dilute its concise and sobre good sense with my
ramblings. He wrote

> I will take this
> opportunity to point out that MS Word is a lot worse than just plain
> text

Having grown even balder and greyer of beard than before as the result of
converting a pretty large and messy fileset from MS Word (ver 3 for DOS in
the oldest layers, with everything from there through to Word 2K) into full
P4 Chapter 12 conformance, I know very well what Syd means. And that's why
people who still keep waving wads of banknotes at me in the hope I will do
the same with their historical dictionaries currently in Word are going to
get pretty sore wrists. Never again.

However, there are all sorts of reasons why other people might want or need
either to do MS Word->TEI conversions on legacy data or even, if they are
brave enough, attempt to set up a workflow where documents are deliberately
originated in MS Word specifically with a view to being batch-converted to
TEI-XML. As I say, am am past such drudgery (the first case) or hubris (the
second) but if I were compelled to do either I would want to give Word 2003
a very serious look and try.

To avoid a disastrous, but persistent misunderstanding: MS Word will never
be a suitable application for direct authoring of TEI conformant documents,
and I don't believe it ever could be, even in future incarnations and/or
with any amount of clever customisation. There is a wide gulf set between
the internals of WYSIWIG word-processors and the sort of editor capable of
handling complex "docucentric" XML of the kind that all but the lightest of
TEI-lite markup entails. Chalk and Cheese. Sophisticated XML editors can,
with some difficulty and more or less awkwardness, be made to look enough
like WSYWIG WP systems to fool the casual onlooker; but the reverse process
is in my view neither feasible nor desirable. No, the point in question is:
is it possible to set up MS Word so that users who are familiar with its
interface and way of doing things can create, using only techniques familiar
to them from Word WP operations, files which can then, in subsequent
processing transparent to the operators, be reliably converted into complex
TEI conformant markup?  Until Word 2003, my honest, experience tested
answer, was "Yes maybe, if needs must, but don't quote me on that, and don't
ask me to do it". Now that Word 2003 is with us, I cannot answer from
anything really deserving to be called "experience", because I am relying
merely on perusal of the docs and some haphazard experimentation in idly
curious moments, but my answer would now be "I guess so, why not, provided
you plan things carefully."

My change of mind and heart is explained by the fact that Word 2003 has, for
the first time in the long history of this application, a thoroughly
thought-through and sophisticated implementation of XML as a mode of
losslessly saving all Word documents as an alternative to rtf or the native
binary format. Now I can hear howls of derision and disbelief going up
already. To the howlers, I'd say two things.

1) Before you scoff on, be sure you have looked long and hard at the XML
(so-called WordprocessingML) implemented specifically in Word 2003. Forget
the bad jokes that passed for XML implementations in the two preceding
iterations of Word. Despite knee-jerk assertions of the contrary, Microsoft
do listen to programming experts (and actually employ some of the best ones
around) as well as to focus-group idiots, and they are concerned, even if in
a way that always has an eye and a half on their own commercial interest, to
follow at least some public standards.  We find here a truly fundamental
re-think and re-write (where lessons have plainly been learned from, among
other sources, the Open Office experience of XML as a native format).

2) If you take my first point to heart and study what MS have done with XML
here, you may still find yourself recoiling in horror at how remote it seems
to be in both spirit and detail from the sort of things we TEI-ers want and
need to do with XML. But I'm sure no member of this list is stupid enough to
come out in consequence with the claim that what we find in Word 2003
therefore "isn't XML". Of course it is. When I was on the Faculty Board of
Germanic Studies of London University, there was once a heated
collective-indignation-expression session (it was hardly a "debate" though
it was billed as such) triggered by the fact that the Faculty of English had
dared to offer an undergraduate course on the plays of Berthold Brecht.
Colleagues ranted on about disciplinary boundaries, breach of demarcation
lines, impossibility of studying a German author in English translation etc
etc, until one shrewd and forthright Faculty member, alas long since taken
in his prime by a horrible illness, took the microphone and said in his
unmistakable, blunt Northern English tones: "Mr Chairman: the Board of
Germanic Studies DOES NOT OWN BERTHOLD BRECHT."  Then he sat down again, to
a justly chastened silence. Enough said. Similarly, my fellow TEI-List-ers,
the TEI does not own XML, neither does the DocBook community, or any other
bunch of incurable docucentrics like ourselves. So what can, and indeed
should be sensibly said here is that Word 2003 XML is nothing like the sort
of XML we need to write, interchange and use. But it's none the worse for
that, not even as a tool for a TEI-based project, provided we realise what
sort of XML it is, why it is as it is, and how we can, if we so wish, put
its features to good use as the basis for the controlled creation by Word
users of files destined for robust and reliable upconversion into TEI
markup.

It would be silly, and of little interest to most list members, for me to
attempt to expand on what I mean by that last phrase in any detail. I would
suggest that anyone who wants to explore futher along those lines take a
good look at the recent book "Office 2003 XML" by Evan Lenz, Mary McRae &
Simon St.Laurent (names which in themselves should also give any residual
scoffers pause for thought), published by O'Reilly earlier this year (ISBN
0-596-00538-5). To avoid misunderstandings, this is not in any sense a
cookbook about how to do what I have just sketched out. Far from it, I don't
think there is a single paragraph in the entire volume that addresses the
controlled creation of documents destined for upconversion. But anyone with
sufficient understanding of the technical issues involved in such controlled
creation at the head of an upconversion pipeline will see from careful study
of the chapters on WordprocessingML plus some individual experimentation,
just how the new XML facilities in that product do indeed lend themselves to
setting up a document input system that would constrain the behaviour of
document creators (in the nature of the case they can hardly be called
"encoders") so that the Word files they produce can be upconverted to TEI by
suitable software employing well-established XML technologies with a high
degree of reliability. As I've already said, I ain't going to do it: I now
only accept proffered wads of consultancy cash from people who have the
manual for a proper XML editor in their other hand and who know how to use
it. But for anyone who wants to, or has to, start from Word, the 2003
iteration really is a new and promising direction.

Michael Beddow

PS 1  Another knee-jerk reaction against even thinking about using Word 2003
is that it runs only on Win2K or XP. This is alleged to be part of a Redmond
conspiracy to force costly upgrades all round. That's as may be, but if so
in my view it just goes to show the Invisible Hand does sometimes work just
like the old guy said. It may be pursuit of filthy lucre that led MS to make
Office 2003 incompatible with any variety or disguise of Windows 9x. but I
believe that the enforced banishment of that morass of pernicious junk is
unequivocally a Good Thing. It's just a pity that good sense didn't consign
it to the dumpsters of IT history  long ago before Uncle Bill started
twisting arms to make that happen.


PS 2 Yes I know I may plainly be in manic rambling mode, but I do know it's
Saturday now. It wasn't when I wrote most of this, but then my (second)
monitor blew, the first one of my usual twin set having departed this world
last Monday. It's been that kind of week. I am finishing and dispatching
this from my wife's laptop, and the unfamiliar keyboard and none-too-bright
screen may well produce more than my usual quota of uncorrected typos.
Apologies for that in advance.

MB

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996
November 1996
October 1996
September 1996
August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
January 1996
December 1995
November 1995
October 1995
September 1995
August 1995
July 1995
June 1995
May 1995
April 1995
March 1995
February 1995
January 1995
December 1994
November 1994
October 1994
September 1994
August 1994
July 1994
June 1994
May 1994
April 1994
March 1994
February 1994
January 1994
December 1993
November 1993
October 1993
September 1993
August 1993
July 1993
June 1993
May 1993
April 1993
March 1993
February 1993
January 1993
December 1992
November 1992
October 1992
September 1992
August 1992
July 1992
June 1992
May 1992
April 1992
March 1992
February 1992
January 1992
December 1991
November 1991
October 1991
September 1991
August 1991
July 1991
June 1991
May 1991
April 1991
March 1991
February 1991
January 1991
December 1990
November 1990
October 1990
September 1990
August 1990
July 1990
June 1990
April 1990
March 1990
February 1990
January 1990

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager