Print

Print


tl;dr

– For TEI projects that contain (many and/or extensive) formulas with 
custom mathematical symbols, we recommend to declare these symbols using 
the TEI elements <char> or <glyph>, even when the symbols only appear 
within MathML.
– For establishing the connection between the MathML symbols in formulas 
and their declarations, we recommend to use @xlink:href (for MathML 2) 
or @href (for MathML 3).
– If the symbols in the formulas have content (private use area 
character or image), add Schematron to ensure that these representations 
correspond to the respective representations in the linked declarations.

What follows is more like a comment than a question ;-) We found few 
references to MathML in the list archives, which tells us that encoding 
mathematical formulas does not seem to be a key component of most 
digital humanities projects. But when it is, it might well involve 
encoding symbols for which no Unicode representation exists.

We (textloop and le-tex) want to share which encoding recommendations we 
arrived at for a current project. And of course we are soliciting 
feedback from the list.

So here we go:

We are converting math-heavy volumes of a text-critical edition of 
Leibniz’s works from LaTeX to TEI. We intend to encode the formulas as 
(presentational) MathML:

Unless another vocabulary seems more appropriate (for example for 
encoding equations that can be consumed by computer algebra systems), we 
recommend using presentational MathML as formula notation. One part of 
this consideration is that we want to have an XML representation (and 
not an unparsed string-only format such as TeX/LaTeX/AMSTeX) so that we 
be able to actually link from a single symbol in a formula to its 
declaration. The case for presentational MathML is the tool support and 
its unchallenged role as the go-to math format in publishing. Compared 
to other XML vocabulary candidates such as, say, SVG, presentational 
MathML conveys at least some mathematical meaning.

Leibniz invented some mathematical operators that didn’t make it into 
Unicode yet. The typesetters used custom LaTeX macros for each of these 
symbols, and the macros ultimately resolve to including images.

We are now thinking about how to encode these symbols in a TEI 
customizing that incorporates MathML. The use of these symbols is 
confined to contexts in which we can use MathML exclusively, instead of 
TEI-native vocabulary. On the other hand, the formulas can be so complex 
that they cannot be appropriately encoded with TEI-native markup.

So in principle we can use MathML’s mglyph element that, by means of its 
@src attribute, will refer to the corresponding image. mglyph’s alt 
attribute may contain the LaTeX macro name so this mapping information 
will still be available when converting the TEI XML to LaTeX in the 
future. (We will most likely invert the production process for future 
volumes, going from TEI to LaTeX rather than the other way round).

However, we think it is potentially more expressive, more flexible, and 
less redundant to use TEI’s <glyph> or <char> elements to declare these 
symbols in a central place. Then the question arises how we can point to 
the declarations, given that MathML elements such as <mo> don’t have a 
dedicated TEI-pointer-like attribute such as [log in to unmask]

Candidates are @xlink:href and @xref. The latter is rarely used. It was 
designed to link between presentational and content MathML elements in 
parallel markup. It is declared as an IDREF in the schema which makes it 
difficult to point to a declaration that is stored in a different file. 
@xlink:href, on the other hand, is declared to be able to hold arbitrary 
content, in particular URLs that can point to the glyph definitions.

So an empty <mo> element with @xlink:href pointing to the glyph 
declaration would be a good candidate.

The glyph could be declared as:

<glyph xml:id="pleibvdash">
   <glyphName>pleibvdash</charName>
   <desc>a dagger &#x2020; with a horizontal line on the left-hand side 
of its stem, or a double dagger &#x2021; without the lower right-hand 
horizontal line.</desc>
   <mapping type="PUA">&#xE212;</mapping>
   <mapping type="tex">\pleibvdash</mapping>
   <graphic url="pm.pdf"/>
   <graphic url="pm.svg"/>
</glyph>


There are two concerns though. The first is that we are considering 
using MathML 3 instead of MathML 2. In MathML 3, the attribute is called 
@href instead of @xlink:href, and its semantics seem to have shifted 
towards actual hyperlinking (instead of unspecified linking mechanisms 
as in MathML 2). This seems to be a minor concern. I don’t think that 
Leibniz or the critical edition editors will start using hyperlinks on 
math symbols any time soon. And if they do, they will be able to use the 
<maction> element in order to make their hyperlinking intent 
unambiguous. If we document the use and rendering expectation of @href 
on <mo> and <mi> in our encoding description, everything should be fine.


An obvious TEI-centric solution would be to allow <g>’s tei.pointer 
attribute @ref also on <mo> in our customization. We cannot pursue this 
approach though because the resulting XML needs to validate against 
tei_allPlus.rng, too (or to an otherwise unaltered tei_allPlus-like 
customization that includes MathML 3 instead of MathML 2). This has been 
stipulated by the editor/publisher.


The second concern is about renderability of the custom symbols in TEI 
viewers and MathML editing tools. The issue is that MathML renderers and 
equation editors won’t be able to properly display an <mo> that has no 
content, but only a custom link to a TEI element instead. (It would at 
best provide a hyperlink that may or may not take you to the declaration 
in the TEI file.)

For the purpose of HTML or LaTeX→PDF renderings, we can always look up 
the appropriate image URL or LaTeX macros in the char/glyph declaration 
and transform the source MathML to another MathML that contains an 
<mglyph src="…"/> element or to LaTeX code, as described below in 
greater detail. (Yes, the detail will become even greater further down 
this posting, dear reader.)

However, when we switch to a TEI-first workflow in the future, someone 
needs to type the equations, probably not as raw XML, but with a visual 
MathML editor. (Although it is possible that the formulas will be 
written in LaTeX and converted to MathML using LaTeXML, as we are doing 
now.) Ideally this editor will provide a customizable symbol palette or 
a toolbar that can hold more complex MathML expressions. In any case, 
without a string value or an image to represent the symbol, it won’t 
display in the formulas that contain it.

So maybe instead of, or in addition to, linking to the glyph definition, 
we might give the <mo> element content, like this:

<mo>&#xE212;</mo>

or

<mo href="#pleibvdash">&#xE212;</mo>

(In order to be able to actually see the symbols, we’d need to patch the 
math font that the equation editor uses.)

The second variant is only supported by an equation editor whose toolbar 
can hold arbitrary MathML expressions, not just custom symbol 
characters. (Examples for these editors are MathType and Wiris Editor.) 
Such an equation editor is most probably able to insert a @href-only, 
otherwise empty, <mo>, although it might not offer a recognizable visual 
representation for it.

Alternatively, this visual representation can be achieved by including 
an <mglyph src="pm.svg"/> in the <mo>. So this would be a third variant:

<mo href="#pleibvdash"><mglyph src="pm.svg"/></mo>

Of course the second and third variants a bit redundant. You could look 
up the <glyph>, provided that it contains <mapping 
type="PUA">&#xE212;</mapping> and that no other <glyph> or <char> 
contains the same PUA mapping, by the string value only. Or you can look 
up the <glyph/> by the image file name.

However, linking by href is more explicit than matching by string value 
or image name, and therefore, despite the redundancy, we think that 
content should always be accompanied by an @href (@xlink:href for MathML 
2) connection.

Therefore, if an equation editor or a TEI viewer for proofreading must 
have content in <mo> in order to display the symbol in formulas, we will 
accept this redundancy.

It is then prudent to add these Schematron checks to the customization:
– Does the @href of an <mo> point to a <glyph> or <char> declaration?
– Is lookup by string content or image file name unambiguous?
– Does the looked-up declaration contain the same PUA string 
representation (or, in the case of images, does it contain a <graphic> 
whose @url matches the @src attribute of an <mglyph>)?

If the equation editor is only able to insert single-character strings 
(with some default <mo> or <mi> markup around them), the project should 
provide an XSLT transformation or an XML refactoring action that 
replaces this element with a properly @hrefed one.


There is another concern that is specific to <mo> elements (in contrast 
to <mi> elements). In MathML, operators may have properties, such as 
spacing to the left and to the right, or the ability to stretch so that 
their height matches the height of a mathematical term that they 
enclose/precede/follow. These properties are not expressed as XML 
attributes, they are rather included in an operator dictionary that is 
maintained by the MathML renderer. Lookup of the dictionary entries is 
by an <mo>’s string content and its position (infix, postfix, prefix) 
relative to the surrounding content, as determined by the MathML 
renderer. So if we want to be able to use this lookup mechanism, the 
<mo>s need to have content, rather than being empty elements that point 
to a declaration.

However, in practice, there is no way to inform a MathML renderer that 
there are new operator dictionary entries for the newly introduced 
symbols. We can nevertheless encode the spacing etc. values that should 
go into the operator dictionary, using TEI vocabulary within <glyph> or 
<char>:

   <charProp>
     <localName>mathOperatorInfixLeftSpace</localName>
     <value>mediummathspace</value>
   </charProp>
   <charProp>
     <localName>mathOperatorInfixRightSpace</localName>
     <value>mediummathspace</value>
   </charProp>
   <charProp>
     <localName>mathOperatorPrefixLeftSpace</localName>
     <value>0em</value>
   </charProp>
   <charProp>
     <localName>mathOperatorPrefixRightSpace</localName>
     <value>veryverythinmathspace</value>
   </charProp>

(these are the operator dictionary lspace/rspace values for common 
operators such as '+', '±', and '−', as recommended in 
https://www.w3.org/TR/MathML3/appendixc.html#oper-dict.entries-table).

It is expected that for HTML renderings, the MathML formulas will be 
slightly transformed so that the @href linking will be replaced with the 
SVG representation that is taken from the linked declaration. This 
transformation process might then, after analyzing whether the operator 
is used as a prefix, an infix or a postfix, insert explicit <mspace 
with="mediummathspace"/> spacers around <mo><mglyph src="pm.svg"/></mo> 
if the default spacing is not satisfactory. Likewise, required 
stretchiness of a custom fence operator might be achieved by scaling the 
SVG content to match the box size of the MathML expression that it 
delimits (haven’t tried though how to make this work in practice).

For PDF generation through LaTeX, we’d look up the LaTeX macros in the 
<glyph> declarations, and leave any spacing issues to the math operator 
declaration in the TeX styles.


This is our treatise on how to refer to custom symbols from MathML. Do 
you share the conclusions that we arrived at, or would you pursue a 
different approach?

Gerrit

-- 
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
[log in to unmask], http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt