On 20/12/12 21:33, Wendell Piez wrote:
> Indeed, and this comes back to the disjoint questions of (a) how
> strong are the latent semantics of the Word source, and (b) how strong
> are the semantics that are "good enough" in the target?
I guess I view use of word for such purposes as a simple tool to
encode such semantics in a lightweight method. I don't rely on
the existing semantics of Word, but would extend any presentation
markup (e.g. titles are in italics) with the use of such styles
like 'analyticTitle' and 'monographTitle' etc. What you can't
have, as you know, is any grammar-like control of what is allowed
where. So a lot of the up-conversion is moving bits around and
coping with edge-cases.
> Where you, Sebastian and I agree (I think) is that we are all
> recommending that the problem be approached, to the extent possible,
> as the specification of a systematic mapping rather than as an ad-hoc
> conversion by hand.
Definitely. I think you make it as systematic as possible using
whatever tools you have at hand, but then think you will always
need to up-convert the result with a series of bespoke rules.
I've always thought that unmediated up-conversions are about as
useful as unmediated interoperability (if it is interoperable
without any effort, then the level of interoperability is
probably not useful).
> The disadvantage of the approach you suggest is that it keeps you in
> Word much longer.
I would argue for capturing semantics or basic annotation this
isn't a bad thing...it means those that understand the material
can provide what for them is a basic annotation but which often
takes specialist knowledge to mark. They're just using a
presentational-based tool to do it, one they are more comfortable
> But certainly, Örn should consider it -- especially if he is among the
> tribe of people who'd rather work in Word than in XML. (I hear it is a
> large tribe.)
Exactly my point. We've done a lot of projects now with those
who add specialised forms of annotation in word, we convert
docxtotei, then have a usually multi-pass up-conversion
afterwards to embed richer semantics based on presentational
triggers. I'd prefer they learn TEI, of course, but
None of this really answers Örn's real questions though, I think. ;-)
Dr James Cummings, [log in to unmask]
Academic IT Services, University of Oxford