On Fri, 2013-11-29 at 14:34 +0000, Sebastian Rahtz wrote:
> On 29 Nov 2013, at 14:19, Louis-Dominique Dubeau <[log in to unmask]> wrote:
> > 1. Use the command line roma2 with --dochtml and extract data from
> > there.
> > 
> > Advantage: what I extract is HTML, which is the format I ultimately
> > want.
> > 
> > Disadvantage: The markup of the ODD file is gone. For instance, it looks
> > like <gloss> is put in the HTML as text surrounded by parentheses.
> > There's no markup of any form to indicate that this used to be a gloss
> > in the odd. Not unsurmountable but more fragile than I'd like.
> > 
> I would suggest that this is usually a better route to take, because you
> take advantage of any improvements/fixes to the stylesheets as time goes
> by, in their interpretation of how to process ODD.
> the downside is, as you say, the semantic information being lost.
> does this really make a difference?

The difference it makes is if I want to extract just <gloss> and some
day down the road instead of <gloss> being put in the HTML in the first
cell of the first row of the table in parentheses, it is put in the
first cell of the first row but without parentheses. Or, in a more
likely case (because I don't think I will want to extract just <gloss>
but I definitely want to extract <desc>), instead of <desc> being put in
the first cell of the first row of the first table of an element's
documentation, it is put in the second cell, or the second row, or it is
no longer in a table, etc. From then on, I'm inadvertently grabbing the
wrong data.

I'm thinking of using odd2json.xsl to get desc. I know that by doing
this it won't contain *any* HTML but this will be okay as an immediate
response. There will be an option to go to the full-fledged doc.

Speaking of full-fledged doc, I have a follow up question. I've been
using the following to generate a bunch of split files instead of one
gigantic file:

$ saxon -s:myTEI.xml.compiled -xsl:/usr/share/xml/tei/stylesheet/odds2/odd2html.xsl STDOUT=false splitLevel=0

However, the index.html file generated by this process looks like it
should contain links to individual files but is in fact devoid of such
links. I don't know if there is an incantation to use to make
odd2html.xsl produce these links. (I'm also unsure what splitLevel means
besides: split/no-split.)