LISTSERV mailing list manager LISTSERV 16.5

Help for TEI-MIGR Archives


TEI-MIGR Archives

TEI-MIGR Archives


TEI-MIGR@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TEI-MIGR Home

TEI-MIGR Home

TEI-MIGR  February 2003

TEI-MIGR February 2003

Subject:

Tidy

From:

Tomaz Erjavec <[log in to unmask]>

Reply-To:

TEI Migration Task Force <[log in to unmask]>

Date:

Tue, 18 Feb 2003 09:45:58 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (81 lines)

Dear Amit (all),
first commiserations (or congratulations) for all the snow!

  Action AK by 2003-03-04: investigate what tidy does with ampersands
  (TE to send problematic file & output listing)

being a tiny (and tidy) action item, I just did it, and below is what
I have to report.

I made the test file and ran it, as below. In short, and unless I
missed some obscure configuration option, the tidy -xml behaviour is
that it understands and translates ISO Latin 1 entities e.g. &aacute;
other entity references it treats as mistakes and substitutes &s with
&amp;s.

The only way around it I guess it to protect the & before sending the
file to tidy, as in the 'obsolete' pipe for sx.
[it seems to me it might better using something that does't require
such cludges]

Syd Bauman writes:
 > I have not tried this yet, but it looks promising.
 > http://www.cs.helsinki.fi/u/penberg/xmlindent/

I had a peek; it seems a bit underdocumented.

One more rather obvious possibility, but I don't recal it being
mentioned: XSLT has the option <xsl:output indent = "yes"/>

For those that use xslt anyway in their conversion, this could be the
simplest. The slight problem I see there that you might well want to
pretty print only a part of the document, e.g. the teiHeader. That is
the part you actually might want to look in source; also, the body
could get a lot heavier if indented. Still, this just means you have
an (extra) pass for indenting for headers only.

Best,
Tomaz

[tomaz@mantra PostMeet]$ tidy -version
HTML Tidy for Linux/x86 released on 1st February 2003

[tomaz@mantra PostMeet]$ cat tidy.xml
<?xml version="1.0">
<!DOCTYPE x [
<!ELEMENT x (y*)>
<!ELEMENT y (#PCDATA)>
<!ENTITY mydash "&#x2014;">
<!ENTITY active "!!!!">
]>
<x><y>hyper&mydash;&active; &amp; &ccaron;&oacute;nfused!</y></x>

[tomaz@mantra PostMeet]$ tidy -i -ascii -xml tidy.xml
line 8 column 12 - Warning: unescaped & or unknown entity "&mydash"
line 8 column 20 - Warning: unescaped & or unknown entity "&active"
line 8 column 35 - Warning: unescaped & or unknown entity "&ccaron"
Info: Doctype given is "&#x2014;"
3 warnings, 0 errors were found!

<?xml version="1.0"?>
<!DOCTYPE x [
<!ELEMENT x (y*)>
<!ELEMENT y (#PCDATA)>
<!ENTITY mydash "&#x2014;">
<!ENTITY active "!!!!">
]>
<x>
  <y>hyper&amp;mydash;&amp;active; &amp;
  &amp;ccaron;&#243;nfused!</y>
</x>
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please send bug reports to [log in to unmask]
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium

--
Toma&zcaron; Erjavec         | Dept. of Intelligent Systems E-8
email: [log in to unmask]  | Jozef Stefan Institute
www:   http://nl.ijs.si/et/  | Jamova 39, SI-1000, Ljubljana
fax:   (+386 1) 4251 038     | Slovenia

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

May 2007
October 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager