=======================================================================
Received: from UICVM.BITNET by UICVM (Mailer R2.07) with BSMTP id 1235; Mon, 07
Sep 92 04:29:54 CDT
Received: from IRUCCIBM.BITNET by UICVM (Mailer R2.07) with BSMTP id 1225; Mon,
07 Sep 92 04:27:57 CDT
Received: from IRUCCVAX.UCC.IE by IRUCCIBM.BITNET (Mailer R2.08) with BSMTP id
1695; Mon, 07 Sep 92 10:27:47 IST
Received: from curia.ucc.ie by IRUCCVAX.UCC.IE (PMDF #12095) id
<[log in to unmask]>; Mon, 7 Sep 1992 10:15 GMT
Received: by curia.ucc.ie (4.1/SMI-4.1) id AA24774; Mon, 7 Sep 92 10:14:40 GMT
Date: Mon, 7 Sep 92 10:14:40 GMT
From: [log in to unmask] (Peter Flynn)
Subject: SGML and TeX
To: [log in to unmask], [log in to unmask]
Cc: [log in to unmask], [log in to unmask]
Message-id: <[log in to unmask]>
X-Envelope-to: [log in to unmask], [log in to unmask], [log in to unmask],
[log in to unmask]
TUGboat 13[2] carries an abstract by Reinhard Wonneberger (pp226--227)
called "Approaching SGML from TeX", in which he summarises some of the
possible ways to use TeX to print from an SGML instance.
The following file is an attempt I cooked up over the weekend to demonstrate
the feasibility of this approach. It still fails on a lot of things, but they
don't look insuperable. The instance referenced at the end of the file can
be retrieved by anon ftp from curia.ucc.ie (143.239.1.8) in pub/curia
--------------------------
% SGML.TEX --- a pilot set of macros to provide rudimentary
% typesetting of SGML-encoded documents with NO
% pre- or postprocessing (you better believe it)
% (c) 1992 Peter Flynn
%
% Warning: this file uses the EPLAIN macros of Karl Berry, obtainable
% from any of the TeX archives such as tex.ac.uk or ymir.claremont.edu
%
% WARNING: this is a pilot. No guarantees, but it seems to
% work on the tags I mention below. It should form the basis
% for much more work, as with proper persuasion, TeX should be
% able to process an unaltered SGML instance (and DTD) and
% produce a piece of acceptable typesetting (IMHO :-).
%
% If you are going to do some work on this, please ask me first:
% I am unlikely to object, but I would like to know about it.
%
% Version history:
%
% 0.1 (Sep 92) reads and acts on a minimal tagset of HTML
% used in network-browseable documents by WWW
% This comprises (work so far):
%
% <title>...</title> Document title
% <h1>...</h1> Header level 1
% <h2>...</h2> Header level 2
% <h3>...</h3> Header level 3
% <dl>... Simple list
% <dt>...<dd>... Item name, text
% </dl> End of list
% <p> Paragraph
% some entities like á (see below)
%
% I haven't figured out how to handle multi-word
% tags (eg with attributes) like <a name=0 h=test.doc>
% yet, because in the parsing, TeX turns the space
% into another category of character. Gimme time!
% Another source of confusion is the presence of a
% slash in a quoted filename within an attribute to
% such tags when TeX is looking for the slash which
% indicates the endtag. However...:-)
%
% All comments to [log in to unmask] (Fax: +353 21 277194)
\input eplain % get it from the archives!
\font\stt=cmtt8 % used for the tags
\font\sbf=cmssbx10 scaled \magstep1 % used for the title
\font\sc=cmcsc10 % used for some headers
% Make a slash an ordinary letter.
\catcode`\/=11
% Define \pos, the position in a tag of the slash character
% and \slash, a flag, 0=no slash found, 1=slash found.
\newcount\pos\newcount\slash
% The \parse and \getchar are adapted from the \length macro
% at the end of Chapter 20 (p.219) of the TeXbook. A call to
% \parse returns \slash=0 or \slash=1 depending on whether
% the argument was a starttag or endtag.
\def\parse#1{\global\pos=0\global\slash=0\getchar#1/}
\def\getchar#1{\ifx#1/\ifnum\pos=0\global\slash=1\global\advance\pos
by1\let\next=\getchar\else\let\next=\relax\fi%
\else\global\advance\pos by1\let\next=\getchar\fi\next}
% Use \raggedcenter from Appendix A 14.34 (p.317) of the TeXbook
\def\raggedcenter{\leftskip=0pt plus12em \rightskip=\leftskip
\parfillskip=0pt \spaceskip=.3333em \xspaceskip=.5em \parindent=0pt
\pretolerance=9999 \tolerance=9999
\hyphenpenalty=9999 \exhyphenpenalty=9999 }
% Define the visual meanings to be attached to the tags
\def\title{\par\begingroup\raggedcenter\sbf}
\def\/title{\bigskip\endgroup}
\def\p{\par}
% Header level tags have to go in a group so that digits can
% be treated as letters for purposes of definition.
\begingroup\catcode`\2=11\catcode`\1=11
\global\def\h1{\bigbreak\noindent\begingroup\bf}
\global\def\/h1{\endgroup\medskip\noindent\ignorespaces}
\global\def\h2{\medbreak\noindent\begingroup\sc}
\global\def\/h2{\endgroup\smallskip\noindent\ignorespaces}
\global\def\h3{\smallbreak\noindent\begingroup\sl}
\global\def\/h3{\endgroup\par\noindent\ignorespaces}
\endgroup
\def\dl{\unorderedlist}
\def\/dl{\endunorderedlist}
\def\dt{\li\it}
\def\dd{\item{}\rm}
\def\a #1{\footnote{#1}}
\def\/a{}
\def\entr{\item{$\bullet$}}
% Make the less-than (opentag) character active, and establish
% two controls to let the use turn on tag presence and formatting
% in the output. Default is no tags and no formatting: this will
% output pages of plain typewriter text. Saying \showtagstrue
% will include the tags in the output; saying \formattrue will
% perform the formatting defined above. Either or both can be
% used, but must be inserted where shown below, before the \input.
\catcode`\<=\active
\newif\ifshowtags\newif\ifformat
% Define the main routine to handle a tag
\def<#1>{\parse{#1}\ifnum\slash=1\ifshowtags\endtag{#1}\fi
\ifformat\csname#1\endcsname\fi
\else\ifformat\csname#1\endcsname\fi
\ifshowtags\starttag{#1}\fi\fi}
% Set up some variable to handle the boxing of tags for output
\newbox\tagbox\newdimen\tagwidth\newdimen\boxwidth
\def\hlinefill{\leaders\hrule height.2pt\hfill}
% Define what a starttag looks like
\def\starttag#1{\setbox\tagbox=\hbox{{\stt#1}}%
\tagwidth=\wd\tagbox\advance\tagwidth by2pt%
\boxwidth=\tagwidth\advance\boxwidth by4pt%
\leavevmode\lower2.5pt\hbox{\vrule width.2pt\vbox{\hsize=\boxwidth\parindent=0pt
\offinterlineskip%
\line{\hbox to\tagwidth{\hlinefill}\hfil}%
\line{\hskip2pt\box\tagbox\kern-.5pt$\rangle$\hfil}%
\line{\hbox to\tagwidth{\hlinefill}\hfil}}}}
% Define what an endtag looks like
\def\endtag#1{\setbox\tagbox=\hbox{{\stt#1}}%
\tagwidth=\wd\tagbox\advance\tagwidth by2pt%
\boxwidth=\tagwidth\advance\boxwidth by4pt%
\leavevmode\lower2.5pt\hbox{\vbox{\hsize=\boxwidth\parindent=0pt\offinterlineski
p%
\line{\hfil\hbox to\tagwidth{\hlinefill}}%
\line{\hfil$\langle$\kern-1pt\box\tagbox\hskip2pt}%
\line{\hfil\hbox to\tagwidth{\hlinefill}}}\vrule width.2pt}}
% Define some of the simpler entities
\def\aacute{\'a}
\def\eacute{\'e}
\def\iacute{\'{\i}}
\def\oacute{\'o}
\def\uacute{\'u}
\def\ocus{\&}
\def\amp{\&}
\def\nodoti{\i}
\def\aelig{\ae}
\def\mdash{---}
% Turn on the recognition of the ampersand so entities become active
\catcode`\&=\active
\def{\csname#1\endcsname}
% Slip in recognition of a few of TeX's special characters
% The % sign itself is done only later, immediately before
% inputting the SGML instance, so that we can continue using
% comments until then.
\catcode`\$=\active\def${\$}
\catcode`\#=\active\def#{\#}
% Uncomment your choice of options here
\showtagstrue
\formattrue
% Make some assumptions about the style of output, based on the above:
\ifshowtags\raggedright\else\fi
\ifformat\else\ttraggedright\fi
\tolerance=7500
% And define the double-quote (") as active so typewriter-style
% quotes come out as open-and-closed in flip-flop manner. Bad style
% to use them in SGML anyway, <quote>...</quote> is better :-)
\ifformat\newcount\qcount\catcode`\"=\active
\def"{\global\advance\qcount by1\ifodd\qcount``\else''\fi}\fi
% Input your SGML instance here, after the comment character
% is redefined (no more comments from here on...
\catcode`\%=\active\def%{\%}
\input /info/curia/Chron_Scot.html
\bye
----------------------------------------------------------
|