I am working on the Oral History project for UCLA at this moment. We
have finished with a pilot project in which we digitized 8
interviews(scan the transcript, convert them into word file then apply
tei tag and converted into TEI/p4 file). You can take a look at this
project at http://digital.library.ucla.edu/cohr/interviewArchives.jsp
We have recently converted them tei/p4 into p5 files and will release
them online soon.
We also have hundreds of transcript in word file and will be working
on them in the coming quarter.
We seperate long text with paragraph tag: <p> content</p>.
Let me know if you have furthur question.
UCLA Digital Library Program
Quoting "Custer, Mark" <[log in to unmask]>:
> Hi all,
> I am new to TEI markup but have recently been charged with encoding a
> set of Oral Histories in P5.
> Unfortunately, we are not encoding the Oral Histories directly from
> their original source material, but we will instead be encoding them
> from transcriptions that were done some time ago and were recorded in MS
> Word. These transcriptions are somewhat "cleaned-up" versions of the
> actual audio that's recorded on the tapes themselves; for example,
> paragraph breaks have been applied to any commentary of substantial
> Of course, we plan to include information about the documents involved
> in our process in the TEI header, but I still have a few questions:
> 1) Has anyone else encountered a project like this, and if so, what
> level of encoding did you attempt to provide?
> 2) Should we decide to retain the paragraph separations for the sake of
> readability, how would that information be best encoded? Would it be
> advisable to break up the longer commentaries into multiple utterance
> tags, or to put a "pause" tag between those paragraph breaks, etc.???
> Mark Custer