Hi Adam, (and TEI-L)
This is an interesting idea and any improvements that could be
fed back into the TEIC Stylesheets
https://github.com/TEIC/Stylesheets would, of course, be
appreciated. I would be interesting to know what you need from
OxGarage that it doesn't provide (if you set up your own
instance, it is customisable, extensible, and able to be used as
a REST web service). That said I usually just use the TEIC
Stylesheets directly rather than via OxGarage. Is the engine
you've built openly available somewhere?
The TEI conversions from docx are better in many ways than the
conversions from other the wordprocessing formats. There are also
small tricks like having docx styles of 'tei_elementName' to get
certain phrase-level elements converted. I've done a lot of
docxtotei and then up-conversion of TEI afterwards for various
projects, so sympathise with the need for continual tweaking of
the Stylesheets. Even when going to HTML, I think by going via
TEI there are many benefits like easily getting to a whole bunch
of other formats as well. This is one of the reasons OxGarage
uses it as a pivot format. The Stylesheets are organised in a
modular way based on format and allow project-specific 'profiles'
across multiple formats. This may work well with focus you are
suggesting.
It is indeed hard to get community support for updating the
stylesheets. The TEI Technical Council maintains them as much as
possible (we're all elected volunteers very busy with many other
things), and our main priority is clearly the generation of the
Guidelines and associated materials such as TEI ODD conversions
to Relax NG. Conversions to and from docx are less of a
priority. We certainly miss the contributions of Sebastian Rahtz
(as well as his friendship), and anything that increases the
ability for individuals, groups, infrastructure providers, or
others to improve transformations to/from TEI is certainly a good
tribute to Sebastian.
Let us know what we can do to help.
-James
On 15/05/16 13:14, adam wrote:
> hi
>
> I am relatively new to the list but have followed TEI for quite a while.
> Essentially my introduction was via Sebastian Rhatz, the kind and
> generous man that he was. I was sorry to see him go.
>
> Lately I have started a non-profit foundation interested in turning
> around scholarly workflows. There are many problems in these workflows
> but perhaps one of the highest value is getting content out of docx (90%
> of scholarly articles and monographs originate in docx) and into other
> formats. We are particularly interested in HTML for many reasons, mostly
> so we can edit the content online, but also because conversion chains
> can be relatively easily formed to get from HTML into other formats that
> publishers need (eg. nicely formatted & paginated PDF for printing, EPUB
> etc).
>
> We have been using the TEI stylesheets and OxGarage, but OxG is a little
> awkward for us and we need more from our tool chain than OxG can provide
> so we have built (relatively quickly) an engine to manage these
> conversions using the TEI stylesheets. However stylesheet conversion is
> forever in need of tweaking it seams and I was contemplating processes
> to continuously improve the conversions. One way is to hire someone, and
> we may do that, and another is to build a community effort around this.
> I think we might be able to appeal to the scholarly infrastructure
> providers and get some traction on a shared effort. To do this I was
> contemplating how this might be actioned. One way is to have a whole lot
> of individual actors trying stuff out and making pull requests, but that
> seems a little haphazard. Better to work out some sort of coordinating
> mechanism and perhaps also a shared tool set to provide a kind of
> 'central hub' for conversion trials and manual QA etc...a central space
> where many people could contribute in one way or he other...
>
> So...this is a long winded way of getting to the heart of the
> matter...I'd love to know if anyone on this list has experience trying
> to set up anything similar? I'm specifically meaning something beyond
> putting the stylesheets on github and waiting for contributions - ie.
> setting up community processes and tools for a shared effort to refine a
> specific conversion type (in this case docx to HTML).
>
> I'm sure there are many here that have this experience, and I'd be
> grateful for any advice or introductions that may be able to take me a
> little further down this path.
>
> Many thanks,
>
> Adam
>
--
Dr James Cummings, [log in to unmask]
Academic IT Services, University of Oxford
|