> Or d) lemma information should not necessarily be encoded in the text
> stream but encoded elsewhere and pointed at using a reference.
Actually, this is the solution we have adopted for the specific project
that arose the initial question. Such solution was practicable because
the project includes a sort of dictionary of all the lemmas founded in
the source material (Anglo Saxon Charters, btw).
But what about if we did not had a dictionary at all? I mean, I was
involved in the past in projects that were lemmatising just for search
purposes and we did not included any lemma collection (or 'lemmario') at
all (I'm in particularly thinking to the lemmatized Works of Dante). In
this case a different solution would have been more suitable.
So my opinion that both options (b) and (d) are to be carried on
together, to give the opportunity to use one or the other (or possibly
both?) according to the project needs.