On Mon, 6 Feb 1995 15:48:44 CST kendall thomason shaw said:
> I hope it is appropriate to ask questions about SGML. If it's not
It's fine. The netnews group comp.text.sgml is also a possibility, if
tei-l doesn't produce the help you need.
[Warning: technical discussion follows. Non-techies bail out now.]
> At this point I imagine a parameter as being the expansion data
>that would result from entity references, for one thing. So, I think
Well, not by definition. By definition, a parameter is a portion of a
declaration which, in the formal grammar of ISO 8879, has or can have
parameter separators (ps) on either side of it. For example: since
production 101 defines an entity declaration as
MDO, "ENTITY", ps+, entity name, ps+, entity text, ps*, MDC
the entity name and entity text are parameters, but the keyword
'ENTITY' is not. Since entity name can expand to a parameter entity
name, which can expand by production 104 to
PERO, ps+, name
the name of a parameter entity is also a parameter (ps on the left
from rule 104, ps on the right from rule 101).
Since parameter-entity references and entity-end signals can both
occur in ps, but not elsewhere in declarations, any parameter or
integral number of parameters can be represented by a reference to a
parameter entity. (Hence the name 'parameter entity', which I regard
as one of the less successful terms invented by 8879.)
It should be noted that the formal grammar of ISO 8879 writes productions
as if a parameter entity reference (in the form PERO, name, reference end)
is *followed* (rather than *replaced*), in the data stream by the entity
text of the entity, and then by the entity-end signal. This is not
intuitively obvious to all readers at first glance, but the grammar is
slightly less confusing if it's borne in mind.
So if you have
<!ENTITY % paraContent "(#PCDATA | foo)*" >
<!ELEMENT para - O (%paraContent) >
the element declaration's content model gets parsed something like this:
(I recast the grammatical productions as rewrite rules, for clarity.)
content model --> model group
--> '(', ts, content token, ts, ')'
--> '(', parameter entity reference, content token, EE, ')'
--> '(', '%paraContent', content token, EE, ')'
--> '(', '%paraContent', model group, EE, ')'
--> '(', '%paraContent', '(', content token, ts, connector,
ts, content token, ')', occurrence indicator, EE, ')'
--> '(', '%paraContent', '(', primitive content token, ' ',
'|', ' ', primitive content token, ')', '*', EE, ')'
--> '(', '%paraContent', '(', '#PCDATA', ' ',
'|', ' ', 'foo', ')', '*', EE, ')'
If you are used to language specifications in which string replacements
and macro substitutions are done as if in a pre-processor (as in C) and
are INVISIBLE to the formal grammar of the language, then this takes
some getting used to. On the other hand, it does allow the designers to
restrict the use of parameter entity references so as to forbid some of
the tricky uses pre-processor-based macro substitutions can be put to.
I assume that may be why they do it this way.
>perhaps the term is used both ways, i.e. both the thing that an entity
>reference expands to, and the entity reference token. He illustrates
No, I think the entity-reference itself is NOT a parameter.
>something by showing entity declarations of this sort:
> <!ENTITY % lef "(yab">
> <!ELEMENT asdf %lef)>
>(actually there's a typo in the book I'm assuming, where the parameter
>entity shown, is not what is referenced) and
> <!ENTITY % lef "(yab")
> <!ENTITY % rig "bad)">
> <!ENTITY % cen "&lef;&rig">
> <!ELEMENT asdf %cen>
>The first illustration is prohibited. While I can't determine wether
>the later is. He says, "In no case does a valid declaration appear to
>a human reader to have a markup error". I am a human reader, there
>doesn't appear to be a markup error, but it seems the question remains
>wether this is a valid declaration.
Yes, the second example is valid. The key rule, for 8879, is that
when you look at the ELEMENT declaration, don't see unbalanced
parentheses. Contrast the first example, where the parentheses do
Fine, you say, but production 55 says pretty clearly that names cannot
contain entity references or entity ends. So what about the entity end
signal caused by the end of entity 'lef' and the entity reference to
'rig', which occur in the middle of the name 'yabbad'?
The answer is that they don't occur in the middle of a name, because the
entity references to lef and rig are resolved during the parsing of the
declaration for 'cen'. That is, when the parameter entity cen is
declared, the entity references in its entity text "%lef;%rig" are
resolved, and the symbol table, or whatever the parser uses as an
equivalent of a symbol table, shows cen with a definition of "(yabbad)",
NOT a definition of "%lef;%rig". When the element asdf is declared, the
reference to cen is expanded to "(yabbad)", which has no illegal entity
ends or references, because it has no entity ends or references AT ALL,
except the reference to 'cen' and the end of 'cen'.
This does mean that when you look at the element or attlist declarations
in a DTD, you won't be confused by apparently mismatched parentheses or
other apparent problems. Despite the best efforts of the designers of
SGML, though, it doesn't mean you won't be confused by a DTD. It just
means any confusion will occur when you try to expand the entity
references, or look at entity declarations, instead of when you look at
>So, I am confused. This definition of parameter and parameter
>separator seems to be important. In production 65, parameter separator
>is shown to be any of an S Seperator, End of Entity signal, a
>parameter entity reference, or a comment. This seems to conflict with
>what was said on the previous page, which made it sound as though it
>were a character string delimiter (See this message page 1 line
>9. Yuck yuck).
I believe you are right about this. Since entity end is not a
character, parameter separators are not necessarily character strings.
Normally the standard is much more careful and consistent than this.
And note that the confusing statement you identify is in the informal
commentary, rather than in the text of the standard.
> In the overview annotations below definition 4.224
>(pg. 204). He says a sequence of _complete_ parameters can be in a
>parameter entity (my emphasis). Does this mean that by "parameter" he
>means the product of the expansion, and that if it's expansion serves
>to form a content model, for example, it must not require other
>parameter entity reference expansions for this to be accomplished? So
I think this rule forbids element declarations like
<!ENTITY % lef "(yab")
<!ENTITY % rig "bad)">
<!ELEMENT asdf %lef;&rig >
because the single parameter 'content model', which expands to
"(yabbad)", is contained in two parameter entities (lef and rig).
It does NOT forbid
<!ENTITY % lef "(yab")
<!ENTITY % rig "bad)">
<!ENTITY % cen "&lef;&rig">
<!ELEMENT asdf %cen >
because the parameter 'content model' is contained only in a single
parameter entity (cen -- which, as I said earlier, contains only
a string, not a pair of entity references). In other words, an extra
layer of indirection makes an illegal obfuscatory reference into a
legal non-obfuscatory reference. This is not intuitively plausible to
everyone, and it is one reason, I suspect, why some people think it
would be better, or at least easier for readers, if the standard
forgot about trying to forbid difficult or confusing declarations, and
concentrated on making its own grammar less difficult and confusing.
>that a validating parser would expand the entity references, determine
>that it's a content model and then flag an error because more that one
>entity reference was required? He goes on, "The entity reference and
The rules for expansion of parameter entity references are complicated
enough that I hate to say HOW a validating parser goes about enforcing
them, because there appear to be lots of ways to go about it. But
in general, yes, a validating parser is responsible for seeing to it
that references to parameter entities follow the rules.
>corresponding entity end signal must both occur in ps separators in
>the same declaration." Does this mean to imply that an entity end
>signal need not occur at the end of an entity reference expansion? If
>so, then on what basis does the entity end signal get generated?
No: entity ends are always signaled when the end of the entity is
encountered, at the time the entity reference is expanded. The rule
about entities beginning and ending in the same declaration is to
prevent things like this:
<!ENTITY % tricky ' (#PCDATA) > <!ELEMENT naughty o o (#PCDATA)' >
<!-- ... -->
<!-- hundreds of lines of DTD later ... -->
<!ELEMENT dizzy - - %tricky >
where an element declaration which appears to declare a single element
named 'dizzy' also turns out to declare an element called 'naughty',
much to the surprise of the reader of the declaration.
>My best guess is that a parameter entity reference expansion must be
>parsable as a complete token within the grammar for SGML. If this is
One or more complete tokens.
>true, then the sentence, "The entity reference and corresponding
>entity end signal must both occur in ps separators in the same
>declaration." is "pointless". And this would mean that a validating
It may be redundant, I think, but not because of the 'one or more
complete parameters rule'. If it is redundant, it's because the rules
specify several desired results, rather than a minimal set of conditions
which will guarantee those results. I believe there is a rule somewhere
which says declarations must begin and end in the same entity; if there
is, the rule about entities beginning and ending in the same declaration
is indeed redundant. But I don't have time to track it down now.
>parser must check that each token parsed after expanding any parameter
>entity references must be in the form one entity reference to one or
And that they either contain one or more complete declarations, or else
don't contain a token that would end the declaration within which they
were referred to.
>Help! The thought that someone might straighten me out on this, will
>allow me to enjoy pleasant sleep tonight. Good night.
I hope this helps some, and that you can get some shuteye.
-C. M. Sperberg-McQueen