Hi -- I attach this note from Amit re Tidy -- it seems he is not on the
list:
Hi Tomaz and All,
I investigated the tidy trouble, there is a flag in tidy
http://tidy.sourceforge.net/docs/quickref.html#quote-ampersand
<quote>
quote-ampersand
Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0
This option specifies if Tidy should output unadorned & characters as
&.
</quote>
This flag if used with value false (default is true) does not convert
& -> & I have tested it on following setup
WIN 98
Tidy release 1st feb 2003.
The sample command is:
tidy --quote-ampersand false source.xml > out.xml
Simmilarily there is quote-marks and quote-nbsp flag.
Let me know if you need any clarification.
Amit
**************************
----- Original Message -----
From: "Tomaz Erjavec" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Tuesday, February 18, 2003 3:45 AM
Subject: Tidy
> Dear Amit (all),
> first commiserations (or congratulations) for all the snow!
>
> Action AK by 2003-03-04: investigate what tidy does with ampersands
> (TE to send problematic file & output listing)
>
> being a tiny (and tidy) action item, I just did it, and below is what
> I have to report.
>
> I made the test file and ran it, as below. In short, and unless I
> missed some obscure configuration option, the tidy -xml behaviour is
> that it understands and translates ISO Latin 1 entities e.g. á
> other entity references it treats as mistakes and substitutes &s with
> &s.
>
> The only way around it I guess it to protect the & before sending the
> file to tidy, as in the 'obsolete' pipe for sx.
> [it seems to me it might better using something that does't require
> such cludges]
>
> Syd Bauman writes:
> > I have not tried this yet, but it looks promising.
> > http://www.cs.helsinki.fi/u/penberg/xmlindent/
>
> I had a peek; it seems a bit underdocumented.
>
> One more rather obvious possibility, but I don't recal it being
> mentioned: XSLT has the option <xsl:output indent = "yes"/>
>
> For those that use xslt anyway in their conversion, this could be the
> simplest. The slight problem I see there that you might well want to
> pretty print only a part of the document, e.g. the teiHeader. That is
> the part you actually might want to look in source; also, the body
> could get a lot heavier if indented. Still, this just means you have
> an (extra) pass for indenting for headers only.
>
> Best,
> Tomaz
>
> [tomaz@mantra PostMeet]$ tidy -version
> HTML Tidy for Linux/x86 released on 1st February 2003
>
> [tomaz@mantra PostMeet]$ cat tidy.xml
> <?xml version="1.0">
> <!DOCTYPE x [
> <!ELEMENT x (y*)>
> <!ELEMENT y (#PCDATA)>
> <!ENTITY mydash "—">
> <!ENTITY active "!!!!">
> ]>
> <x><y>hyper&mydash;&active; & čónfused!</y></x>
>
> [tomaz@mantra PostMeet]$ tidy -i -ascii -xml tidy.xml
> line 8 column 12 - Warning: unescaped & or unknown entity "&mydash"
> line 8 column 20 - Warning: unescaped & or unknown entity "&active"
> line 8 column 35 - Warning: unescaped & or unknown entity "&ccaron"
> Info: Doctype given is "—"
> 3 warnings, 0 errors were found!
>
> <?xml version="1.0"?>
> <!DOCTYPE x [
> <!ELEMENT x (y*)>
> <!ELEMENT y (#PCDATA)>
> <!ENTITY mydash "—">
> <!ENTITY active "!!!!">
> ]>
> <x>
> <y>hyper&mydash;&active; &
> &ccaron;ónfused!</y>
> </x>
> To learn more about HTML Tidy see http://tidy.sourceforge.net
> Please send bug reports to [log in to unmask]
> HTML and CSS specifications are available from http://www.w3.org/
> Lobby your company to join W3C, see http://www.w3.org/Consortium
>
> --
> Tomaž Erjavec | Dept. of Intelligent Systems E-8
> email: [log in to unmask] | Jozef Stefan Institute
> www: http://nl.ijs.si/et/ | Jamova 39, SI-1000, Ljubljana
> fax: (+386 1) 4251 038 | Slovenia
>
|