LISTSERV mailing list manager LISTSERV 16.5

Help for TEI-L Archives


TEI-L Archives

TEI-L Archives


TEI-L@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TEI-L Home

TEI-L Home

TEI-L  October 2014

TEI-L October 2014

Subject:

summary of html from tei

From:

Martin Mueller <[log in to unmask]>

Reply-To:

Martin Mueller <[log in to unmask]>

Date:

Fri, 31 Oct 2014 13:21:52 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (1 lines)


I went through yesterday's very lively exchange about "html from tei" and
put all the postings into the memo below in the order in which they
appeared in my email program. I read Sebastian's initial question in the
context of the TEI Simple project, for which I will be drafting
documentation. So this is a useful exercise, at least for me.

Sebastian sent me an XML file that extracts the 28,000 structured notes
from 2,500 TCP texts. I spent a little time browsing through them. A large
number of them are like the note below from Erasmus' Apothegms and weigh
in below Twitter size

<note place="marg" anchored="true">
                     <p>A mannes fame is the chief odour y^t he smelleth
of.</p>
                     <p>Contynually to smelle of sweet odours is an eiuill
sauour in a manne.</p>
                  </note>

There is a question in my mind whether the <p> elements in notes of this
type are actually necessary and whether one could treat them as if they
were instances of

<note place="marg" anchored="true">
A mannes fame is the chief odour y^t he smelleth of. Contynually to smelle
of sweet odours is an eiuill sauour in a manne.
</note>

In which case they would seem to fall in a category of marginal notes for
whose rendering there appear to be plausible solutions. I will spend part
of my weekend working on some form of a complexity test. What are the
types (and tokens) of notes that pose difficult challenges and are there
ways of algorithmically identifying them by length and structure? I'll
tell you may answer when I have it.





========
Sebastian Rahtz:

if this isn't your area of interest, stop reading now.

I want some ideas about rendering marginal notes in HTML.

EEBO TCP has _many_ examples of very complex structured structured notes
with paragraphs, lists, tables etc within them. So naturally I expect to
make the <note> into a <div> with CSS properties to make it float left or
right. Fine so far. But the <note> in the TEI XML occurs inside (shall we
say) a <hi> element inside a <p>, which I would naturally be rendering as
a <span>. but a <div> inside a <span> is not allowed in the tiny brain of
HTML, because it doesn¹t realize its being floated off.

So what's a boy to do?

  a) forget about HTML validity and let the browsers just do it. problem:
epub checking fails, 	and its possible some epub renderers will therefore
give up
  b) make everything, everywhere, be a <div> and sort it out with
display-style in CSS
  c) make everything, everywhere, be a <span> and sort it out with
display-style in CSS
  d) use HTML5 <aside>, but no thats a flow-level element too
  e) move the <div> outside the <span>, up enough levels until its valid.
but that mean it loses context
  f) split the <span (and any ancestor <span>) in two, insert the <div>,
and restart the <span>
  g) scream and shout and kick

==========
Peter Boot:

  I would do  e), i.e.  move the <div> outside the <span>, up enough
levels until its valid.  Put the notes into a separate container, create
hyperlinks between text and note, and if display in context or something
like that is needed, use javascript.

==========

Sebastian Rahtz:

Ah, you mean I could put all the notes at the end, and float them into the
margin at run time using JS.   That's not a bad idea.

=========

Lou Burnard:

Semantically purer too!
==========


Joel Kalvesmaki
Editor in Byzantine Studies
Dumbarton Oaks

I would do b), i.e. make everything, everywhere, be a <div> and sort it
out with display-style in CSS

As I understand it HTML was designed to let <html:div> be a generic
segmenting tool for anything that might (but not necessarily) deserve new
lines (blocks), and <html:span> be anything that definitely didn't (inline
segments). I think it would be a mistake to assign <tei:p> to <html:span>,
unless if you're using <tei:p> to annotate text segments that don't have
new lines. (I've seen sort of use of <tei:p> before, legitimately, most
notably of biblical texts.)
Despite the disallowance of html:span/html:div, in CSS rendering you can
treat either <html:div> or <html:span> as block, inline, etc. For this
reason I now tend to avoid <html:span> altogether in any HTML I create,
and simply make sure I type any <html:div> via @class. In fact, one could
even dispense with <html:p>, <html:h1>, <html:h2>, etc., which are, at
heart, just simply types of divisions.

=========

Sebastian Rahtz:

But isn't it just as abusive to map <tei:hi> to <html:div>? swings and
roundabouts.

I am operating on the assumption that we had better be prepared for the
CSS not being applied as we intended (someone may swap in a different one
for audio rendering, for example), so preserving the basic div/span
distinction in HTML seems pretty important to me. I may well be
wrong-headed there.

the downside of the more semantically pure ³put the notes in a separate
container anyway² is that marginal labels like ³Tim. II 21² or ³234 BC²
are not <aside>s in the same sense as normal marginal notes.

==========

Peter Boot:

I think it's a reasonable assumption that the CSS will not always be
applied as intended. E.g a web archive environment might leave it out, or
a tool for text analysis.

=========

Lou Burnard:

responding to Sebastian's comment that "marginal labels like ³Tim. II 21²
or ³234 BC² are not <aside>s in the same sense as normal marginal notes":

Which is why God gaue us the @type attribute on <note>

=========

Daniel O'Donnell:

Perhaps a dumb question. But in that case why not deemphasise the html?
I.e. serve out the xml and either style as is or do client side
transformations? 

It seems to me that the problem you are setting yourself is rapidly
becoming how can I preserve the semantic granularity of the original TEI
in an HTML text that is used for interchange without negotiation, and I'd
guess the answer is going to be 'you should design something like the TEI
to do it.'

How do I do this in HTML in this specific context is one question. How do
I do it so that it maintains its semantic integrity in unknown contexts is
a whole other one.

=========

Sebastian Rahtz:
responding to Daniel O'Donnell's question "why not deemphasize the html"?

because I want to deliver ebooks, where client side transform isn¹t really
an option. and even then, it doesn¹t seem safe to dynamically create HTML
which messes up its fundamental concepts about flow and inline

[Responding to the second part of O'Donnell's post:]

yes and no. but I am not talking about _complete_ semantic interchange,
but lossy downgrading to the universal interchange format while trying to
keep the limited semantics which HTML _does_ offer. Which is why, for
example, I am inclined to see if I can follow Ron¹s idea, but implement it
using <aside>.

I don¹t think I can go as far purely presentation HTML using only div/span
_passim_

=========

Joel Kalvesmaki:
responding to SR's question "isn't it just as abusive to map <tei:hi> to
<html:div>?"

I'd say no: <tei:hi> is but another way in TEI to segment/divide text,
which is exactly what <html:div> was meant to do.

I've been transforming a collection of structured texts with analogues to
<tei:hi> elements. I've been very pleased with the flexibility and
expressiveness in <html:div>. IMO, ordinary human intelligibility is not
as important in HTML as it is in TEI (after all, what portion of readers
actually look at a page's source?). Plus with this approach you can write
more concise and expressive css that doesn't require you to worry about
the hierarchy, or even the name of the element. For example, maybe you
want to provide the same background to your floating callouts as you do to
named entities. All you need is assigned something like teiHi and teiName
to html:div/@class, then in css do something like this:
             tei-hi,.tei-name{background-color:gray}

Plus with this approach think of all the cool things you could do with css
selectors, e.g., to pick every tei-derived element in your html just use
this selector in your css:
  *[class^='tei']
  
===========
Martin Holmes:
responding to SR's original post:

There's no reason, surely, that you can't create a span with display:
block and float: right? I do that all the time. Another option for
positioning is not to float, but to set it as position: absolute, specify
a width, and right: 2em or something like that, so the thing appears as a
block next to the right edge.

=========
Peter Gorman
responding to Martin Holmes

I think the problem may be that those notes may themselves contain
block-level structures that won't fit into HTML <span>

=========

Sebastian Rahtz:
responding to Martin Holmes

no. but when that <span> has <p> and <ul> inside it, the validator howls.

=========

Martin Holmes: responding to Peter Gorman:

In that case, it's either divs all the way down, spans all the way down,
or an arbitrary point at which you move from divs to spans, and then check
the ancestry in every template to see whether you've passed that point.

Rendering the notes at the end is simpler, though. You'd have to put an
anchor in the text at the point you want the JavaScript to move them to,
so you can retrieve the right offset. There's also the problem of notes
and labels overlapping if the margin is not wide enough or the font size
is too big.

=========
Sebastian Rahtz: responding to Martin Holmes' first point:

right. I can sort of see how jQuery position() or offset() is going to
help do this. if anyone is bored enough to write up a proof of concept of
that, I¹d be very happy :-}

responding to Martin Holmes' point about overlapping notes and labels:

thats much much harder. makes me feel mildly ill even thinking about the
chaos which could result

=========
Martin Mueller:

For most of the typically very short and simple marginal notes in Early
Modern texts, display in the margin is a real benefit. Where you have
complex marginal notes--as for instance in Ben Jonson's Works--moving the
notes to the end may be the better solution, and displaying them as
marginal notes on a screen may be a nuisance. I think there are
algorithmic ways of cleanly dividing the cases.

=========
Sebastian Rahtz responding to Martin Mueller

What would your algorithm be? You are suggesting a much simpler solution,
which is say that all complex side notes should be converted to endnotes,
without trying to move them at all. But is this solely based on whether
they have internal
block-level components?

==========
Martin Mueller responding to Sebastian Rahtz

I need to have a look at more examples. Paul Schaffner probably has all
the cases in his head. But my hunch, to be confirmed by trawling through a
sample of the TCP corpus, is that very few marginal notes have internal
block level components

=========

Sebastian Rahtz responding to Martin Mueller

You may be surprised.  I can detect 28504 occurrences in 2448 texts from
the 61k texts
in EEBO/ECCO/Evans.  That¹s occurring in 1 out 28 books, then.

Stuart Yeates responding to Sebastian Rahtz's original questions

Personally I'd love to do (a), forget about HTML validity, but I'm not
sure what level of browser support there is. Maybe a couple of sample
cases could be run through http://netrenderer.com/
/http://browsershots.org/ and checked for serious issues ?

I'm also aware that increasingly ePub and other HTML-containing
TEI-output formats may be the inputs into third-party toolchains. It
may be more robust to have a semantically correct HTML output option
for such cases.

=========

Louis-Dominique Dubeu responding to Stuart Yeates:

I advise against this. When you give a browser invalid HTML, you are
venturing into "undefined behavior" territory. If it works when you test
it *now*, it's just luck. If it works with version X of browser A, there's
no guarantee that it will work with X-1 or X+1. I've recently run into an
issue where JavaScript code that worked in Chrome 33 did not work in
Chrome 34. It was fixed for Chrome 35 **only because** there is a standard
and a defined behavior that people had been relying on and this was
promptly brought up in the bug report. In the case of bug reports where
the change in behavior does *not* run afoul of a standard, good luck
getting speedy resolution, or any resolution at all.

=========
Paul Schaffner responding to Martin Mueller:

Many examples of 17th-century printing challenge the
very distinction between 'main text' and 'margin'.
The one I was editing this very minute, for example,
is far from exceptional:

http://www.umich.edu/~pfs/tcp/presbytery.jpg

But are things any different nowadays? Magazine and
web page layout, for example.

If you're looking for long, elaborate notes, the
18th century philosophical novel The Life of John
Buncle springs to mind. On this random page, for
instance:

http://www.umich.edu/~pfs/tcp/buncle_note.jpg,

you can see just two lines of 'main' text at the top
of the page; the rest of the page is occupied by
the conclusion of footnote 25 (which occupied most
of the preceding three pages); the beginning of
footnote 26; and an asterisk-flagged footnote
that nests within note 25. Notes in this book
routinely themselves have notes; routinely contain
poems, chunks of plays, multiple stanzas and
paragraphs, block quotations, etc. etc. But again,
this is not that unusual in modern printing either,
especially of the academic kind. (One of the
chief advantages of my old Nota Bene word processor
was that it supported three simultaneous *series*
of footnotes, all of which could display on the
same page; and I was not the only one who could
foresee a use for such a feature -- reflected in the TCP
texts through the use of values like @place="marg1"
@place="marg2".)

None of which helps Sebastian; if anything, the reverse.

Probably my favorite online note display is that
used by the CCEL, e.g.

http://www.ccel.org/ccel/schaff/npnf214.vii.iii.html

in which users may select (go to the little gear at top
right) to see notes displayed in
the margins, at the foot of the page, or (the default)
suppressed altogether till clicked on. If I remember rightly,
CCEL uses html:span for all the notes, and flattens markup
within them. But that may be wrong.

=========
Stuart Yeates responding to Louis-Dominique Dubeu's warning about
"undefined behaviour" with option a):

If you're using javascript, you have a toolchain; thus the second branch
of my suggestion.

=========
Elisa responding to Sebastian Rahtz' discovery of 28504 structured
marginal notes in 2448 texts:

Having spent much time with heavily annotated long poems of the 18th- and
19th-c. by the likes of Erasmus Darwin and Robert Southey ( whose very
annotations I was just talking about last week at our conference in
Evanston), I am aware of how long and complicated these can
become--Sometimes whole poems are written out in long footnotes, and quite
frequently we see block-level structures, yes. I am not really happy with
the common tendency to push annotations to the ends of documents,
particularly when they were originally presented so the eye would move
across or down a page to a layer of paratext. This may sound awfully
unsightly to the e-reader aesthetic, but there is something to be said for
having the web interface preserve the positioning of notes embedded within
and immediately accessible from the lines of poetry or chunks of prose
text in which they're signaled. I don't much like the idea of HTML's
losing this simple association of proximity--it seems like caving to
convenience and worse, pushing a layer of paratext away from its point of
association. But I may just be obsessed with note-heavy Bob Southey.

=========
Paul Schaffner responding to Elisa

I agree: the demotion of marginalia (and the other things that
people call 'paratext' these days) is to be resisted if at all possible.

Perhaps it's time to revive html:frameset ! (one frame for text,
one for notes ...)  :)

=========

Peter Robinson responding to Paul Schaffner

Oh no! not frame sets!

It is a perfectly straightforward process to push your notes out into a
html div, and the text into another html div, and then use the Œfloat¹
style attribute on the two divs so that your notes appear to the left, or
right, or both, of the text they annotate. Simple css, indeed (google
"floating div css² for lots of examples).  You can go further, and use
javascript/jquery to identify exactly where the text referenced is in your
browser window, and then place the annotation at the appropriate hight to
the left or right.  And much more.
=========

Stuart Yeates adding to Peter Robinson

If nothing else, framesets can't be used in ePub.

=========

Sebastian Rahtz responding to Peter Robinson:

i might contest that. notes inside notes inside notes are not so very easy.

=========

Stuart Yeates adding to Sebastian Rahtz:

In our experience it's relatively straightforward until you start
wanting page-break paraphernalia (anchors, page images, navigation,
etc) at each of the nested levels of notes. Users have an expectation
that they can link to 'page 123' of a document; and that following
that link will take them to a representation of the intellectual
content on that page in the print book. Making that happen reliably is
surprisingly hard.

The NZETC has certain features that can't be used reliably in
combination, for example nested footnotes and works with back matter
printed backwards (i.e. to be read from the back page forwards towards
the first page). Fortunately those aren't widely used in combination.

=========

Conal Tuohy commenting on Sebastian Rahtz' original question:

My vote would be for (b) - to use <div> for pretty much everything, and to
deal with typographical issues in CSS.

I do sympathize with the desire to maximize the retention of TEI-encoded
semantics, but these days I am less inclined to believe there is any
significant payoff in doing so, and I think the additional complexity of
the stylesheets is a barrier to modular reuse and a prohibitive cognitive
burden for many people who might otherwise contribute to the stylesheets.

=========

Stuart Yeates responding to Conal Tuohy:

Is that a viable solution for ePubs read on epaper devices? I thought
there were pretty strong limits on what you could get away with in CSS
on such devices.

If we're doing everything in CSS, why go to HTML at all, why not
follow TEI Boilerplate
(http://dcl.slis.indiana.edu/teibp/content/demo.xml) and use TEI+CSS?

=========

Peter Flynn responding to Sebastian Rahtz' original question

I once played around with making the notes into divs, but outputting
them after the end-tag of the paragraph or other
mixed-content-containing element in which they occurred. But I haven't
checked to see if this is valid XHTML/EPUB3 because I haven't had to do
it for some while.

The problem I had was aligning them with the point of reference, if you
are inserting something for the user to click on. Or were you intending
them just to be there of their own accord as the source paragraph
scrolls into view (like paper marginal notes)?

=========

Sebastian Rahtz responding to Peter Flynn:
I have indeed bunged in notes as floating <aside>s after the end
of the containing paragraphs, and then jerked them up to align with the
point
of insertion with position: absolute. Now, of course, the wretched things
overlap.

========

Andreas Wagner commenting on Sebastian Rahtz' option f):

Taking the risk of making a fool of myself: What is the argument against
(f)
again? Obviously the nesting of spans make it a complex thing and I admit
of
being ignorant of exactly how complex this can get, but are you all
sidestepping (f) because of this or because of a different and more
important, yet unmentioned problem?

=========

Stuart Yeates responding to Andreas Wagner:

My argument against (f) is that it splits a single logical entities
into sequences of two or more XML elements in such a way that breaks
everything that expects logical entities to be contiguous or
referenceable by a single ID-REF. This breakage ranges from things as
ubiquitous as cut and paste, up through javascript and your more
esoteric XML and web infrastructure.

The real answer that this is a symptom of TEI being more expressive
than HTML. There is no 'best' solution, merely a number of potential
tradeoffs whose relative merits are dependent on the kinds of
documents one has and what you're trying to do with them.

=========

Sebastian Rahtz adding to Stuart Yeates
apart from the very real semantic problems which Stuart gives (which I
think are acceptable if processsing a <pb/> like this,
but not otherwise. try this

<p>I have endeavoured in this Ghostly little book, <span
        style="font-style:italic">to raise the Ghost</span><div
        style="float:right">a phantom</div> <span
        style="font-style:italic">of an
        Idea</span>, which shall not put my readers out of humour with
        themselves, with each other, with the season, or with me. May
        it haunt their houses pleasantly, and no one wish to lay
        it. </p>


you¹ll see that though the inner <div> floats OK, the italic span has a
line break
in the middle. This renders the technique useless, sadly.






Paul 


Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996
November 1996
October 1996
September 1996
August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
January 1996
December 1995
November 1995
October 1995
September 1995
August 1995
July 1995
June 1995
May 1995
April 1995
March 1995
February 1995
January 1995
December 1994
November 1994
October 1994
September 1994
August 1994
July 1994
June 1994
May 1994
April 1994
March 1994
February 1994
January 1994
December 1993
November 1993
October 1993
September 1993
August 1993
July 1993
June 1993
May 1993
April 1993
March 1993
February 1993
January 1993
December 1992
November 1992
October 1992
September 1992
August 1992
July 1992
June 1992
May 1992
April 1992
March 1992
February 1992
January 1992
December 1991
November 1991
October 1991
September 1991
August 1991
July 1991
June 1991
May 1991
April 1991
March 1991
February 1991
January 1991
December 1990
November 1990
October 1990
September 1990
August 1990
July 1990
June 1990
April 1990
March 1990
February 1990
January 1990

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager