I append below a copy of an article recently posted on a UK list
concerned with electronic publication. The article summarizes some
proposals from a UK-based consortium of electronic journal publishers
for a form of header to contain minimal cataloguing details. I haven't
yet been in touch with them to see whether or not they're aware that the
TEI also has a solution to this problem for them.
So who on this list would like to volunteer to produce a mapping between
the tags proposed in the document below and the TEI header tags?
Shouldn't take more than a few minutes...
Date: Thu, 1 Jun 95 17:47:10 BST
To: [log in to unmask]
From: [log in to unmask] (Damien Keown)
Subject: Re: Publication Data for EJS
X-List: [log in to unmask]
Reply-To: [log in to unmask]
Sender: [log in to unmask]
Date: Thu, 01 Jun 95 16:17:01 gmt
From: "Hill, David" <[log in to unmask]>
To: [log in to unmask]
Subject: Re: Publication Data for EJS
[Note: An Acrobat version of the OASIS data set is available from the HJF
WWW site under "Projects" -DK]
Regarding the need for standards, there has been an attempt to set out
a minimum set of information to be provided for electronic journal
'headers' by the OASIS (Organisation for Article Standards in Science)
consortium of 18 European journal publishers. The intention is that
any digital provision of journal headers should have this information
as a minimum so that users of such information, e.g. libraries, can
have confidence that it will be available in this standard form.
Please send any comments, suggestions or other feedback on this set to
John Jarvis, John Wiley & Sons, Baffins Lane, Chichester PO19 1UD.
If the formatting of this document is affected in the email, I'd be
happy to send a hard copy to anyone who wants one. Send email to me
directly, not to the list.
Sage Publications, 6 Bonhill Street, London EC2A 4PU, UK
Email: [log in to unmask] Tel: +44 (171) 374 0645 Fax: +44 (171) 374 8741
******* OASIS *******
REPORT OF THE TECHNICAL SUBCOMMITTEE, 23 March 1995
1) We have recommended a list of just over 20 fields for a core
2) We have recommended tags as well as fields.
3) We have generally left details of data entry up to individual
OASIS DATA SET
An OASIS-compliant data set will include the following journal data
Field Tag Comment
Publisher's name <pnm>
Publisher's location <loc> Location of publication of
specific journal being referred
to (in cases where publisher
has multiple locations).
Recommend town or city
plus state or country (e.g.
Cambridge, MA; Cambridge, UK).
Journal title <jtl>
Journal subtitle <jsbt>
Journal abbreviated <jabt>
Volume identification <vid>
Issue identification <iid> Not all issues have numbers
(number) (e.g. single annuals).
Default is zero.
Cover date <cd> Recommend SICI format (i.e. in
numerical form, with
e.g. 1 January 1995 = 19950101;
SICI rules also deal with
combined issues, quarterlies,
Article type <artty> See below for types.
Category <categ> For further classification or
information beyond <artty>.
Page count <ppct> Number of pages, rounded up to
First page number <ppf> Will not be known
pre-compilation of issue (e.g.
for "future awareness"
service). In such cases,
defaults to zero.
Last page number <ppl> As for <ppf>
Copyright notice <crn>
Article title <atl> If no titles as such, can use
<artty> or <categ>, e.g.
Forename(s) <fnms> Subdivision of <au>. May
consist of initials or full
forename(s), or a
combination of these.
Surname <snm> Subdivision of <au>.
Link of author <orf> Every <au> must have an <orf>,
to affiliation (=organization even if there is only one
reference) author and one affiliation. All
<orf>s are numbered.
Affiliation <aff> Author's academic or
professional address. Not
subdivided into any other
Link of affiliation <oid> Every <aff> must have an <oid>.
to author (=organization Each <oid> is numbered to tie
identification) in with the relevant <orf>.
Abstract <abs> This abbreviation applies to
all material that would
generally be recognized as an
abstract, whether the text
itself is called summary,
resume, outline, overview, or
whatever. Absence of an
abstract should be specifically
entered as "No abstract".
Key word(s) <kwd> Absence should be specifically
entered as "No key words".
The above list covers 24 fields. We have not tried to identify other
fields - this is a minimum data set, not a maximum one. Some of the
more likely extra fields would include a journal series name;
alternative title (e.g. a translation); coden; figure, table,
reference and word counts; "history" (i.e. received, revised,
accepted dates); more author details (title, name suffixes such as Jr
and III, qualifications, roles); more affiliation details (department,
organization, street, city, country, etc.); contract grant
information; subject code; anticipated and actual publication dates
(as opposed to cover date!); price(s) for article delivery.
We recommend that none of these fields are part of the core data set,
because i) there will be too many "null fields", ii) much of this
information requires a degree of interpretive intervention to extract
it from the current systems used to typeset journal articles, and iii)
some publishers will specifically not want to include some of this
data, e.g. anticipated dates, prices.
The following is the complete set of <artty>s:
RA = Research Article
RV = Review Article
RL = Research Letter
SC = Short Communication
ER = Erratum
AB = Abstract
BR = Book Review
XX = Miscellaneous
OASIS recommends use of the specific tag names (as indicated) but
OASIS compliance would not be conditional upon their use, as long as
any other tags could be translated directly into our set (i.e. there
has to be a one-to-one relationship). This means that someone using,
say, the AAP tags (e.g. <title> rather than <atl>) could carry on
using these, but that anyone starting up or not otherwise committed to
a particular tag set could use ours without needing to reinvent the
Much as we would all prefer to see a database that had consistent
entry regarding issues such as capitalization, we feel that this would
create enormous difficulties for publishers since there is so little
standardization in typographic representation of, say, article titles
in even one publisher's list, let alone across publishers. Consistency
of input does not preclude inconsistency of output, but there are
real, non-trivial problems regarding hard capitals and hard lower case
which mean that either typesetting from the database or producing a
consistent data entry from the typesetting information would be
difficult, and the alternative of double origination would be
In some cases, we have made recommendations in the Comments column of
the mandatory fields list, but in essence we are recommending that
there be no imposition of a common approach to data entry.
For special characters, we recommend only that these must be
represented in 7-bit ASCII form (e.g. in TEX; or as SGML entities; or
generic coding), together with an explanation of meaning.
Limited to article types?
There may well be some articles that have no real place in the
database (e.g. Calendar of events; Books received) but our
recommendation is that we don't limit what goes in - if particular
publishers choose to put in the examples mentioned, they can do so, as
long as they use "XX" as the article type.
It is also up to each publisher how they treat articles such as Book
Reviews (e.g. some BRs actually have their own titles, especially if
they are multiple reviews; for others, the title of the BR is
essentially the title of the book being reviewed, but we have not
laid down any rules regarding how much information needs to be
We have made no recommendations regarding the order in which the
information can be supplied. Neither have we identified which items
can be repeated (e.g. <au>) and which ones not (e.g. <atl>). These
issues can be addressed as part of the database design, which will
need to look at how we can take data from a number of disparate
sources and merge into a single database.