LISTSERV mailing list manager LISTSERV 16.5

Help for TEI-L Archives


TEI-L Archives

TEI-L Archives


TEI-L@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TEI-L Home

TEI-L Home

TEI-L  December 2014

TEI-L December 2014

Subject:

Re: Describing glyphs in <msDesc>

From:

Paolo Monella <[log in to unmask]>

Reply-To:

Paolo Monella <[log in to unmask]>

Date:

Tue, 2 Dec 2014 11:39:39 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (99 lines)

Dear all,

the recent discussion on <g> has hit a couple of points that interestingly resemble a methodological suggestion of Tito Orlandi I've been trying to apply in practice.

Orlandi suggests that when an encoder creates a digital transcription of a manuscript or another primary source with a pre-modern writing system, they should give a formal complete list of (and possibly a description of) _each_ grapheme or glyph of the writing system of that source. 

Such a complete list was published (but not encoded) in P. Robinson & E. Solopova, Guidelines for the transcription of the manuscripts of the Wife of Bath's Prologue (http://bit.ly/1rSp5Zj). Orlandi published his (more formalized, but not in XML/TEI) for his Machiavelli in http://bit.ly/1rSpfQv

From what I gather from the previous discussion on this list, Sebastian Rahtz and Janusz S. Bien's students created scripts that can dynamically generate such complete lists. So Sebastian wonders why this should be stored in the header. But Martin Holmes mentioned that there is some discussion in the TEI Council on allowing such storage "in the header for easy harvesting." Also Stuart Yeates suggested that it would be useful to have "an XML fragment that generated [...] a summary of the characters used."
Now, Frederike raises a similar concern.

This reassures me that some of those encoding textual sources feel a need to provide such a complete list, to improve interoperability. This is a good practice in 'traditional' editing of ancient textual sources.

I've been working for a while trying to find a way to encode such a _complete_ list in TEI P5 somehow. I even gave a talk on this at the TEI 2013 conference in Rome (http://bit.ly/1zKZPsq). But:
1) the Guidelines prevent you from re-defining characters that are already existing (and defined) in Unicode,
2) and if, nonetheless, you create a <char> or <glyph> element for each grapheme (from a to z and beyond), you then have to use all <g> elements in the <text>.

Frederike, from what I know (but the other list subscribers can correct me) your proposed use of <g> in the <scriptDesc> / <summary> / <scriptNote> is probably not the way to go. Element <g> is meant to use a glyph in the <text>, not to define it. For defining purposes, <char> or <glyph> are appropriate. However, I guess that <char> and <glyph> are not allowed in <scriptNote> because it's not where they belong: they should go inside <charDecl>.

You wrote that you don't want to link the characters in the <text> to their list/description in the header: "I 'just' want to describe their shape. Further I do not want to refer from <g> to <glyph>, as I do not want to use <g> in <text> at all".
In line of principle, I guess, you could declare each glyph inside <charDecl> and then just ignore this declaration in the <text>.
This would significantly break the philosophy of the Guidelines (don't re-define what's already in Unicode), but the practice of Roberto Rosselli Del Turco and other "tag abusers" :-) on the list seems to show that those encoding pre-modern primary sources feel a need to use <char>, <glyph> and <g> more widely to provide information on the specific usage of characters that already are in Unicode.

However, as far as I am concerned (I don't know if Paul Schaffner or others share my interest), I would like to have a TEI way to encode such a complete list in the header and then have some mechanism to formally bind each character in the <text> with the elements of that list. Sebastian mentioned a TEI "Lint". Paul mentioned SGML text entities. I could also mention an old SGML feature that I'll call "archaeology of methodology" in a talk I'll give on Thursday on the <charDecl> topic (http://www.unipa.it/paolo.monella/dixit2014/index.html): the WSD - Writing System Declaration. Not that I want to raise the dead, but I think that some mechanism like that could be a good practice for those encoding textual sources on pre-modern writing systems.

All best,
Paolo


Il Mon, 1 Dec 2014 13:25:16 +0100
Frederike Neuber <[log in to unmask]> ha scritto:

> Dear List,
> 
> I have a question regarding the enrichment of a source with
> paleographical information. I want to identify a script and the
> single glyphs is consists of, assign them a unique name and ID,
> describe their shape and link to a .png of the glyph, as well as
> possibly refer to an external vocabulary related to the description.
> 
> 
> Browsing through the guidelines I ended up with the solution
> suggested in chapter 5.2 of the guidelines on 'Markup Constructs for
> Representation of Characters and Glyphs': <g> (in <text>) and <glyph>
> (in <teiHeader>). Even if at first, this way of capturing information
> about the script seemed the most suitable, now I have some doubts to
> apply it for several reasons:
> 
> 
> I do not want to encode the gylphs for a later representation of the
> script in my edition, instead I 'just' want to describe their shape.
> Further I do not want to refer from <g> to <glyph>, as I do not want
> to use <g> in <text> at all. Just to explain shortly: I am working
> with prints, so for one entire volume there is always used one set of
> glyphs, and there is no need to wrap a <g> around each character in
> <text>, but it makes more sense to store this information in the
> <header>. However, I am now thinking if maybe <encodingDesc> is the
> wrong 'location' to capture the information I am interested in and if
> it would be better so put it somewhere in the <msDesc>, as it is more
> about 'describing' than 'encoding'.
> 
> 
> I haven't found any suggestions to describe single glyphs in <msDesc>
> in the Guidelines. The <scriptDesc> seems to be intended for a more
> general description about the type of script used in the source.
> Nevertheless I figured out some TEI conform solution, even if I am
> not sure, if the <g> element is intended to be used in this way
> (<glyph> is not allowed):
> 
> 
> 
> <msDesc>
>                ...
>                <physDesc>
>                   <scriptDesc>
>                      <summary>Contains two distinct styles of
> scripts</summary>
>                      <scriptNote xml:id="script1">
>                         <g xml:id="a_min"><!-- Description of single
> glyph
> --></g>
>                         <g xml:id="a_maj"></g>
>                      </scriptNote>
>                      <scriptNote xml:id="script2">
>                         <g></g><!-- and so on -->
>                      </scriptNote>
>                   </scriptDesc>
>                </physDesc>
>     </msDesc>
> 
> 
> Has anyone worked on a similar issue or has an idea how to deal with
> it?
> 
> 
> Best,
> 
> Frederike

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager