I don't think I fully understand the problem you're having. (What,
for example, does InDesign have to do with it? You're not trying to
read a TEI file directly into InDesign, are you?)
For the most part, the end processors that transformed TEI gets sent
to normalize space. E.g., "asspirants which" and
which" will be treated exactly the same
by a web browser. So when transforming TEI to XHTML I simply ignore
this problem. And you're right, there is no "<le>" element in TEI. We
indicate where a break occurred, but not the explicit beginning and
ending of a line. (At least, not within <text>, a transcription of
the logical structure of the document with some concessions to its
physical layout. Inside <sourceDoc>, which is a transcription of the
physical layout of a document, we do mark lines explicitly using the
All that said, if your target system cares, then you need to tidy up.
Ones first thought is just to use the normalize-space() function on
| <xsl:template match="text()">
| <xsl:value-of select="normalize-space(.)"/>
This turns out to be a terrible idea. We often need leading or
trailing space, as in
<said>This is a job for <persName ref="#kalel">Superman</persName>!</said>
Here, if you normalize all text nodes, you end up with
This is a job forSuperman.
which is not at all right. So I might try something like the
following. It just produces a copy of the input TEI with <lb>
and its preceding whitespace replaced by a single blank. This
isn't likely sufficient for your purposes, but it's a start.
|<?xml version="1.0" encoding="UTF-8"?>
| <!-- copy everything not otherwise dealt with: -->
| <xsl:template match="@*|node()">
| <xsl:apply-templates select="@*|node()"/>
| <!-- catch text nodes that precede an <lb> and end in whitespace -->
| <xsl:template match="text()[following-sibling::node()[self::lb]]">
| <!-- strip trailing whitespace -->
| <xsl:value-of select="replace(.,'\s+$','')"/>
| <!-- replace <lb> by a single blank -->
| <xsl:template match="lb">
| <xsl:text> </xsl:text>
When run on your input paragraph, it produces:
| <p n="74"> <s n="1">Minor shades of difference in Mind-healing have origi‐ nated with certain opposing factions, springing up among unchristian students, who, fusing with a class of aspirants which snatch at whatever is progressive, call it their first- fruits, or else <emph ana="italic">post mortem </emph> evidence. </s>
> We use the following format for text with enforced line breaks.
> <p n="74">
> <lb n="23"/><s n="1">Minor shades of difference in Mind-healing
> have origi‐ <lb n="24"/>nated with certain opposing
> factions, springing up among <lb n="25"/>unchristian students,
> who, fusing with a class of aspirants <lb n="26"/>which snatch at
> whatever is progressive, call it their first- <lb n="27"/>fruits,
> or else <emph ana="italic">post mortem </emph> evidence. </s>
> This works well as we can enforce the line-breaks for targets that
> support that or ignore them for targets, like Epub, that do not.
> The problem is dealing with white space, the cr or lf characters at
> the end of the lines and the spaces before each line if the TEI is
> indented for easy readability by humans. Also some formats like
> InDesign use a line ending char or tag to indicate a new line not a
> line break at the beginning of the line.
> I don't see a line ending tag, <le/> , to encapsulate this
> line-ending concept in the TEI?
> How are people handling this situation?