On 18/04/2010 17:04, Lou Burnard wrote:
> As you say, the fact that a cell in the spreadsheet was left blank might
> mean that the age was unknown, or that the person entering it forgot to
> enter it, but the distinction seems unimportant.
But you see, it's important to me. I want to take my spreadsheet data
source at its word but make it clear in my XML that data was not given
in the source. That is, it should be clear that the data really is
unknown, not that there was an oversight in the spreadsheet-to-Excel
process that led to some data being omitted.
> Either way when I copy
> it into the text I can write "No value supplied" with a clean
> conscience. Either way I can encode (sensu strictu) its significance any
> way I like (-1 or 0 or 9999)
The age example is once again proving not to be a good one.
Let's use <occupation> instead. I would like a way of encoding
No value supplied
No value supplied.
unstated
(unstated)
unknown
or any other natural-language human-readable string in a way that makes
these all equivalent. Something like
<occupation unknown="y">
This would make it more likely the data could be mined more accurately
within a project and especially across projects using TEI encoding.
In the meantime, I'm going to stick in XML comments as placeholders --
something like:
<!-- <occupation> unspecified -->
--Kevin
|