Dear Amit (all),
first commiserations (or congratulations) for all the snow!
Action AK by 2003-03-04: investigate what tidy does with ampersands
(TE to send problematic file & output listing)
being a tiny (and tidy) action item, I just did it, and below is what
I have to report.
I made the test file and ran it, as below. In short, and unless I
missed some obscure configuration option, the tidy -xml behaviour is
that it understands and translates ISO Latin 1 entities e.g. á
other entity references it treats as mistakes and substitutes &s with
&s.
The only way around it I guess it to protect the & before sending the
file to tidy, as in the 'obsolete' pipe for sx.
[it seems to me it might better using something that does't require
such cludges]
Syd Bauman writes:
> I have not tried this yet, but it looks promising.
> http://www.cs.helsinki.fi/u/penberg/xmlindent/
I had a peek; it seems a bit underdocumented.
One more rather obvious possibility, but I don't recal it being
mentioned: XSLT has the option <xsl:output indent = "yes"/>
For those that use xslt anyway in their conversion, this could be the
simplest. The slight problem I see there that you might well want to
pretty print only a part of the document, e.g. the teiHeader. That is
the part you actually might want to look in source; also, the body
could get a lot heavier if indented. Still, this just means you have
an (extra) pass for indenting for headers only.
Best,
Tomaz
[tomaz@mantra PostMeet]$ tidy -version
HTML Tidy for Linux/x86 released on 1st February 2003
[tomaz@mantra PostMeet]$ cat tidy.xml
<?xml version="1.0">
<!DOCTYPE x [
<!ELEMENT x (y*)>
<!ELEMENT y (#PCDATA)>
<!ENTITY mydash "—">
<!ENTITY active "!!!!">
]>
<x><y>hyper&mydash;&active; & čónfused!</y></x>
[tomaz@mantra PostMeet]$ tidy -i -ascii -xml tidy.xml
line 8 column 12 - Warning: unescaped & or unknown entity "&mydash"
line 8 column 20 - Warning: unescaped & or unknown entity "&active"
line 8 column 35 - Warning: unescaped & or unknown entity "&ccaron"
Info: Doctype given is "—"
3 warnings, 0 errors were found!
<?xml version="1.0"?>
<!DOCTYPE x [
<!ELEMENT x (y*)>
<!ELEMENT y (#PCDATA)>
<!ENTITY mydash "—">
<!ENTITY active "!!!!">
]>
<x>
<y>hyper&mydash;&active; &
&ccaron;ónfused!</y>
</x>
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please send bug reports to [log in to unmask]
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
--
Tomaž Erjavec | Dept. of Intelligent Systems E-8
email: [log in to unmask] | Jozef Stefan Institute
www: http://nl.ijs.si/et/ | Jamova 39, SI-1000, Ljubljana
fax: (+386 1) 4251 038 | Slovenia
|