## TEI-L@LISTSERV.BROWN.EDU

#### View:

 Message: [ First | Previous | Next | Last ] By Topic: [ First | Previous | Next | Last ] By Author: [ First | Previous | Next | Last ] Font: Proportional Font

Subject:

Re: Whatever happened to @precision?

From:

Date:

Wed, 30 Jan 2008 12:02:22 +0900

Content-Type:

text/plain

Parts/Attachments:

 text/plain (58 lines)
 ```It may be that the TEI needs to consider the approach taken by the physical sciences when dealing with uncertain quantities. In the context of data analysis, a distinction is made between "accuracy" and "precision". My understanding is that "accuracy" describes how well a measurement corresponds to the actual value. So if the actual date of something is 115, then 115 is an accurate estimate while 190 is not so accurate. There is a big problem here as the humanities doesn't have anything like physical standards against which to measure accuracy. "Precision", on the other hand refers to the interval in which a measurement is expected to lie. A narrow interval corresponds to high precision and a broad interval to low precision. Typically, expressions of precision involve a confidence level, with 95% being a popular one. Thus (a +/- b) means that the probability of the actual value being somewhere between (a - b) and (a + b) is 95%. So given an actual value of 115 (remembering that there is often no way to know the actual value in the humanities), we have all sorts of possibilities. E.g. accurate and precise: 115 +/- 10 inaccurate and precise: 190 +/- 10 accurate and imprecise: 115 +- 100 inaccurate and imprecise: 190 +/- 100 Looking at things in this way, something like a 2nd century date for a papyrus roll might be expressed as 150 +/- 50. However, if you asked the palaeographer if this is indeed what is meant, you might find that he or she is not sure that the actual date is in this range. In view of all this, I think that we need to specify an interval and a confidence level to properly express uncertainty concerning a quantity. One need not use numbers for the confidence level. Actually, categories such as "high", "medium" or "low" are preferable because something like "47%" gives a false sense of precision to something that is more akin to a forensic category, such as "beyond reasonable doubt". Also, a small number of categories makes it more likely that separate encoders will come up with the same encoding for the same thing. Thus, an adequate description of the magnitude of an unknown quantity requires: (1) an estimate of the quantity (e.g. 115) (2) an interval in which the quantity is thought to lie (e.g. +/- 50) (3) the confidence attached to the assertion that the actual value lies within the interval (e.g. "high", "medium", "low"). Various schemes can be used to specify confidence. One possibility has three levels corresponding to notional confidence levels (C) of C > 95% (high), 5% < C < 95% (medium), C < 5% (low). Another possibility has four levels: C > 95% (highly probable), 50% < C < 95% (probable), 5% < C < 50% (improbable), C < 5% (highly improbable). Best, Tim Finney```