I'm having trouble tagging uncertain, imprecise, or incorrect dates.
The dates are those on which the letters in our collection (40 000 or so
letters, written by Bertrand Russell) were written. The dates may be
questionable or imprecise for various reasons, including:
- the date on which a letter was written is inferred - from a postmark,
a reference in the letter, etc.
- the exact date on which a letter was written is not known, only that
it was written within a certain period (the period may be certain or
not).
- the date was (perhaps intentionally) stated incorrectly by the author
of the letter.
- part of a date (or date range) is known, and part inferred, e.g., the
text of the letter states that it is Tuesday, and say the year, 1969,
while the month is inferred by some other means
So, we'd like to be able to do two things:
1) Individually mark the certainty of the day, the month, the year, or
some combination of the three. For example, the year and month may be
uncertain, however the day is certain, and should not be marked
uncertain. We'd also like to indicate the type of uncertainty in a
systematic way, e.g., 'inferred', 'postmark', 'stated', 'incorrect'
2) Define ranges for the year, month, day, or some combination of the
three, for example, 1969-1970/07-08/19 -- that is, the 19th (and only
the 19th) day of either July or August in either 1969 or 1970.
A fuller (and fairly long) explanation follows. If you don't want, or
need, to read the explanation, I'm looking for suggestions to satisfy
the two requirements listed above.
Certainty
---------
Section 6.4.4 says:
"Where the certainty (i.e. reliability) of the date or time itself is in
question, rather than its precision, the encoder should record this fact
using the mechanisms discussed in chapter 17 Certainty and
Responsibility."
If, as stated above, the mechanisms discussed in Chapter 17 should be
used to express the reliability of a date, what is the certainty=
attribute on <date> to be used for?
Following the advice of 6.4.4, I looked to Chapter 17, which describes
two ways to record uncertainty that I initially thought could be used
for dates: the <certainty> element and the <note type="uncertainty">
element and attribute.
17.1 describes where the <certainty> element should be used:
"Many types of uncertainty may be distinguished. The <certainty> element
is designed to encode the following sorts:
* a given tag may or may not correctly apply (e.g. a given word may
be a personal name, or perhaps not)
* the precise point at which an element begins or ends is uncertain
* the value to be given for an attribute is uncertain
* content supplied by the encoder (such as the expansion of an
abbreviation marked by the <abbr> tag) is uncertain
* the transcription of a source text is uncertain, perhaps because
it is hard to read or hard to hear; this sort of uncertainty is also
handled by the <unclear> element in section 18.2.3 Damage, Illegibility,
and Supplied Text"
And not used:
"Uncertainty about the truth of assertions in the text and other sorts
of authorial and editorial uncertainty about whether the content is
satisfactory are not handled by the <certainty> element, though they may
be expressed using the <note> element."
"The following types of uncertainty are not indicated with the
<certainty> element:
* a number or date is imprecise
* the text is ambiguous, so a given passage has several possible
interpretations
* a transcriber, editor, or author wishes to indicate a level of
confidence in a factual assertion made in the text
* an author is not sure if the sentence she has chosen to start a
paragraph is really the one she wants to retain in the final version"
These excerpts from the guidelines seem to preclude using <certainty> to
tag a date. Specifically, the third point, "a transcriber, editor, or
author wishes to indicate a level of confidence in a factual assertion
made in the text", suggests that <certainty> shouldn't be used to
describe the 'reliability' of a date. To me, the guidelines for
<certainty> indicate that <certainty> should only be used when tag
'usage' is in question, which isn't the case with a date. In this case,
we know we are using the right tag -- we're just not sure if the
information we are tagging is accurate.
So we are left with the <note> element. Section 17.1.1 describes how to
use a <note type="uncertainty"> to tag uncertainty. However, as pointed
out: "The advantage of this technique is its relative simplicity. Its
disadvantage is that the nature and degree of uncertainty are not
conveyed in any systematic way and thus are not susceptible to any sort
of automatic processing."
We would like to avoid using <note> so that, as described in the
guidelines, we can systematically describe the uncertainty of our dates,
for instance, by marking types of uncertainty, e.g., 'postmark',
'inferred', 'stated'. Can anyone suggest an alternative to <note>? If
<note> is the only way to tag our uncertainty, can anyone suggest a
somewhat systematic use of <note> to tag the uncertainty for the day,
month, year, or some combination of the three?
Date Ranges
-----------
Can anyone suggest how we can specify ranges for the day, month, or year
portion of a date, as described earlier? The <dateRange> element seems
to only allow two complete dates for the to= and from= attributes, e.g.,
<dateRange from="1969-07-19" to="1969-08-19">July 19th to September
19th, 1969</dateRange>,
while we want something like
<dateRange from="?" to="?">The 19th of either July or August, in either
1969 or 1970 /dateRange>
Section 20.4 discusses the <dateStruct> element, which allows the day,
month, and year to be tagged individually. The section states:
"A relative temporal expression describes a date or time with reference
to some other (absolute) temporal expression, and thus contains the
following elements in addition to those listed above:
* <distance> that part of a relative temporal or spatial expression
which indicates the distance between the place or time denoted by it and
the place or time referred to within it.
exact indicates the degree of accuracy associated with the
distance.
* <offset> that part of a relative temporal or spatial expression which
indicates the direction of the offset between the two place names,
dates, or times involved in the expression."
The <offset> tag therefore seems promising, but I'm not sure if it can
(should) be used as I'd like:
<dateStruct value="?????">
<day>19</day>
<month>July</month>
<offset>to</offset>
<month>August</month>
<year>1969</year>
<offset>to</offset>
<year>1970</year>
</dateStruct>
since 20.4.1 states:
" Component elements of a <dateStruct> may be repeated, provided that
only a single temporal expression is intended:
<dateStruct value="1993-05-14">
<day type="name">Friday</day>,
<day type="number">14</day>
<month>May</month>
<year>1993</year>
</dateStruct>"
Is a date range considered a 'single temporal expression'?
Or, can I do something like:
<dateStruct value="?????">
<day>19</day>
<month type="from">July</month>
<month type="to">August</month>
<year type="from">1969</year>
<year type="to">1969</year>
</dateStruct>
Finally then, combining ranges with certainty:
<dateStruct value="?????">
<day>19</day>
<month>July</month>
<offset>to</offset>
<month>August</month>
<year>1969</year>
<note type="uncertainty">The letter was written at least as late as
1969 - Russell refers to events of early 1969 in the letter.<note>
<offset>to</offset>
<year>1970</year>
<note type="uncertainty">The letter can have been written no later
than 1970 - the year of Russell's death.<note>
</dateStruct>
This method, of course, doesn't allow for the type of uncertainty to be
specified in a systematic way. Ideally, I'd like to specify something
like: <month certainty="postmark">August</month>.
I looked for postings in the archives about certainty and dates, but
found little. Has anyone else run across these issues?
I apologize for the lengthy email.
Thank you,
James Chartrand
Bertrand Russell Research Centre
McMaster University, Hamilton, Ontario
905-525-9140 ext 24896
[log in to unmask]
|