We used a few ad-hoc ways of annotating the blanks/blank lines we ran
into in the course of marking up some of the ECI/MC1 material (see
below for brief plug for our CD-ROM), here are some extracts from some
editorial practices descriptions:
`A "pl:(number)" 'rend' attribute on <p> marks cases where paragraphs
were separated from their predecessors by other than a single blank
line, in which case the number gives the number of blank lines.'
` Lines with only blank chars on have been rendered empty;
Line-final blank characters have been elided.
. . .
Blank lines are removed, but 'rend' attributes are provided
wherever this is done. Similarly for the paragraph-initial spacing.
The key 'pl' is used for blank lines (short for "paragraph leading"),
and the key 'pi' is used for indentation (short for "paragraph
indent"), e.g. <p rend="pl:2,pi:2">.
Note that leading of 1 and indent of 3 is the default, and unmarked.'
Hope this helps a tiny bit.
Shameless plug: The ECI/MC1, consisting of more than 40 corpora in
more than 20 languages, is available for non-commercial use on CD-ROM
for around 30 ECU plus postage&packing. Most of the material is for
European languages, and in many cases this is the first time
substantial amounts of material in these languages has been made
publicly available. For more information about the ECI/MC1, including
how to order and pay for it, send e-mail to [log in to unmask]