Print

Print


Hi there,

Daniel O'Donnell wrote:
> On Tue, 2006-30-05 at 11:09 -0700, Martin Holmes wrote:
>> Hi Dan,
>>
>> Dan O'Donnell wrote:
> <snip and summarise>
> Re: explanation for putting the last saved date in. Your explanation
> certainly makes practical sense and I was being a purist.

I like purism. In the long run we get better results by being careful 
about this kind of thing.

> As more tools are developed perhaps this needs serious thought, however,
> as I suspect this kind of information will become crucial (especially
> given the kind of havoc Joe's Universal TEI editor seemed to cause in
> your fantasy example). 

I think it's inevitable that tools will emerge for doing specialized 
things to specific bits of a TEI file -- the header comes to mind, since 
it's potentially huge and difficult to understand, and yet nearly every 
document benefits from having as detailed a header as possible. I can 
imagine a teiHeader Wizard application being very useful in helping 
newbies create their first teiHeaders, or speeding up the process for 
more experienced folks. It's also likely that, as there are more tools 
available, the chance that more than one will be used on the same file 
(in addition to general editing tools like oXygen) will grow, and it's 
going to be easy for one tool to tread on another's toes by editing or 
removing information which the other needs. So I really like your idea 
of a "tool" attribute (more below).

> Early on in this discussion, people argued
> against putting tools in the revision statement because there was no
> intellectual responsibility involved. But the more sophisticated the
> tool the more important the revision record, I'd think.

> Certainly your solution is a good interim one. But ultimately, I think
> we are going to have to think more about how tools are like and not like
> people.

I don't think they're much like them at all -- they're mechanical, 
inflexible and unpragmatic. What humans do is intellectual and 
teleological. That's why mixing up human actions and the actions of 
tools in respStmt seems to me misguided; there's no responsibility 
involved when a tool does something (unless it's the responsibility of 
the tool creator, which is not really the same thing). If you use 
Photoshop to sharpen up a photograph, the responsibility for those 
changes to the file rest with the user, as do the reasons for taking 
those actions; Photoshop may well be coded such that it records in the 
file mechanical data about itself (its version) or even about the 
editing actions it took (so that perhaps they could be undone), but that 
data is mechanical and qualitatively different from information about 
the changes to the photograph.

> <more snip>
> 
>>> I think the idea of a creationApp is an excellent idea. But like I said,
>>> I think the best model may be the old langDesc.
>> I've searched the Web and the tei site for this, and I can't find any 
>> mention of an element called langDesc. 
> 
> My mistake, I meant langusage:
> 
> <langusage>
>     <language id="ENG">English, Present Day (Canadian Standard Spelling)</language>
> </langusage>
> 
> This is a model for what I think you are doing because it provides
> metadata about something that can be referred to elsewhere (I suppose it
> might be a specialised kind of FS). Of course creationApp is not quite
> the same thing, because you presumably wouldn't use it in the way you
> used to use tei:@lang (i.e. marking individual elements as made by a
> specific tool in the main text (though I suppose you could).

You certainly could, and that would be very useful information. If we go 
back to our putative model, we have this:

<creatorApp appId="ImageMarkupTool1">
       <appIdent key="appName">Image Markup Tool</ident>
       <appIdent key="appVersion">1.0.3.5</appIdent>
       <appIdent key="appURI">http://..../</appIdent>
       <appIdent key="userDefined" userKey="licence">Mozilla Public 
Licence 1.1</appIdent>
       <date value="2006-05-25T11:03:55">Last save: 2006-05-25 at
   11:03:55</date>
</creatorApp>

Here I'm using an attribute called appId; if we follow the langusage 
model, this would become a plain xml:id:

<creatorApp xml:id="ImageMarkupTool1">
...
</creatorApp>

This has one very useful consequence: it ensures that each application 
can only put one of these elements into the file (otherwise it will be 
invalid because of duplicate xml:id atts). Also, it means finding the 
element in a validated doc may be a bit quicker (getElementById() is 
usually faster than something like //creatorApp[@appId='blah']).

We've simplified the proposal by removing one new attribute and using 
the existing xml:id. We could now implement your idea below by proposing 
a general-purpose attribute called either "tool" or "creatorApp", which 
any application could add to an element that it wanted to assert a 
special relationship with:

<div tool="ImageMarkupTool1">
	...content my program cares about...
</div>

We might even make this a multiple-value space-delimited attribute, so 
more than one application could assert its interest in an element:

<div tools="ImageMarkupTool1 JoesTeiHeaderMaker2">
...
</div>

Many folks will probably hate this idea, because it provides a simple 
vector for applications to pepper a file with proprietary labels, and 
that's a very legitimate concern. However, any tool which is not a 
general editor (i.e. something more specialized than oXygen) is likely 
to have to do this sort of thing anyway, so that it can easily find its 
"own" data areas in a larger file which may be edited in other tools. At 
the moment, I'm doing this by using xml:id="imtAnnotationCategories" or 
type="imtAnnotation" attributes, using values which are (I hope, but can 
never be sure) unique to my application. With a system such as we're 
proposing, at least the impact of a tool on a file is utterly 
predictable: if you just strip out all "creatorApp" tags and all 
instances of "tools" attributes, then the file will be cleansed of all 
evidence of tools. Furthermore, if this set of elements and attributes 
are encapsulated in a single module in P5, it would be easy to add them 
to a schema when needed, and strip them out later.

I still have no sense of how most people feel about this, though, so I'm 
not sure whether it's appropriate to go ahead with a formal feature 
request at this point. I think between us, we've arrived at something 
fairly lean, precise and useful.

Just to reiterate: the proposal would be for an element block like this 
to be available in the teiHeader encodingDesc:

<creatorApp xml:id="ImageMarkupTool1">
       <appIdent key="appName">Image Markup Tool</ident>
       <appIdent key="appVersion">1.0.3.5</appIdent>
       <appIdent key="appURI">http://..../</appIdent>
       <appIdent key="userDefined" userKey="licence">Mozilla Public 
Licence 1.1</appIdent>
       <date value="2006-05-25T11:03:55">Last save: 2006-05-25 at
   11:03:55</date>
</creatorApp>

The appIdent tag key attribute is an enumeration, which allows many core 
  predictable application info types (appName, appVersion, appURI) as 
well as one value "userDefined"; if userDefined is chosen, then the 
"userKey" attribute is used to specify the type of information contained 
in the tag. <date> is just a standard date tag, and the app would use it 
to record the date and time it last saved the file (see previous 
discussions).

A supplementary proposal is that an attribute "tools" be generally 
available (as a global attribute?); it would be a space-separated list 
of pointers to the xml:id attributes of a creatorApp elements in the header.

All comments really welcome!

Cheers,
Martin

> But it
> would, for example, allow you to identify those tagsDecl elements imt is
> adding and removing to the header (or to harp on a point, information
> about revisions introduced by the tool ;-) ) in the same way in P4 you
> could indicate that a specific element contained text in Old High
> German, e.g.:
> 
> <tagsDecl tool="imt" ... >
> 
> Obviously in your case, you are adding more children, but the principle
> seems the same to me.
> 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
([log in to unmask])
Half-Baked Software, Inc.
([log in to unmask])
[log in to unmask]