On Thu, 26 Oct 2006 01:17:52 -0700, Arthaey Angosii <[log in to unmask]> wrote:

>On 10/26/06, Alex Fink <[log in to unmask]> wrote:
>> This is a great idea.  I'd actually been recently thinking that I should
>> convert my conlangs' lexica to a more structured format (they're currently
>> in un-marked-up and inconsistently-formatted human-readable text files) so I
>> could process them by computer; this would be perfect for that.
>It's also similar to my own conversion from Shoebox to XML. I'm
>mid-conversion, but I do have an XML Schema. Perhaps it can be used as
>a basis for this program, or at least to spur discussion? In either
>capacity, it might prove helpful.

I'd forgotten about Shoebox; it might be a good idea for this program to
accept Shoebox input in some form, perhaps by first running it through a
converter like (or identical to?) yours.  

I remember getting the impression the last time I looked at Shoebox's format
that it was interlinear-centric (which makes sense).  IIRC the main
definition field is the gloss, suitable for interlinear use, and of course
you can also have a proper definition and more explicatory notes but it's
the gloss that's primary.  It looks like your schema follows this, and
Gary's proposals seem to have a similar leaning ("hazy" as definition,
"marked by the presence of haze" as note).  My own preference would be to
make the longer definition primary and the gloss/metalanguage search key
secondary; this way it's the language-internal divisions of semantic space
and not the equivalences to some other language that are at the forefront.  

>The schema itself supports much more than shown in the example:
>multiple pronuncation schemes, definitions in addition to short
>glosses, semantic domains, multiple example sentences,
>cross-references (such as synonyms), notes, subentries, and senses.

These probably resolve to questions about Shoebox rather than your own
designs, but:
- why are etymologies cross-references?  If ancestral words have their own
entries at all, wouldn't they be in a completely different file?
Probably 'synchronic derivation' and 'diachronic etymology' should be
different fields.  
- is there provision for differentiating word class from, um, word subclass,
from morphological information?  Like "noun, masculine, /nd/-stem", or
"verb, subject is patientive, third conjugation"? 

>Below my signature (for easy skipping) is the 193-line schema file. (I
>would have attached it and not bothered those not interested, but I
>assume the listserv kills attachments.)
>Please also note that it's my first schema, and as such I may have
>done things in less-than-optimal ways just to get it to validate. :P

You could've fooled me!  I hadn't actually seen an XML Schema before this one.

Looking through your tech page it looks like you've actually got a number of
components of what Gary's planning to write.  Are they very
Asha'ille-specific and specific to your formatting, or could they
generalise?  We might have starting points for a number of aspects of the
project at hand, in the latter case.