George Corley, On 23/03/2013 19:37:
> I really don't see how a "Count ALL the words!" mentality is useful for
> anything other than a sort of linguistic dick-measurement.

Responding also to other messages in the thread, I think (i) there's a real and linguistically significant phenomenon (such that vocabulary size, in the relevant sense, is not just a delusory notion of layfolk) and (ii) resulting counts of vocab size may be brandished as cultural dick-measurement but also affect the epistemic experience of language and parole of even the monoglot.

What English has a lot of is (i) underived lexemes in (ii) the corpus of texts to which the average speaker potentially has access (face-to-face interactions, broadcast speech, film, written texts published and online). Big corpus for cultural reasons -- longish written history, lots of speakers. Lots of underived lexemes because of borrowing and the feedback loop of being receptive to new underived lexemes.

The significance for the speaker of English is (1) that most speakers will regularly encounter new (to them) underived lexemes, (2) that for most speakers this happens more, not less, the more they are exposed to texts not just from face-to-face interaction, so that the speaker doesn't have the sense that there is a finite and perceptibly dwindling supply of words not yet known to them, and (3) that no speaker ever knows all the underived lexemes. (3) would not be the experience of a speaker in a hunter-gatherer society, or, for that matter, of a peasant in pre-modern England. (1) and (2) will be much less true for languages other than English. Assuming you have some method for identifying underived lexemes, you could measure this empirically by counting the number of hapax underived lexemes per million word increase in the size of a representatively-sampled corpus. My prediction would be that the frequency of hapaxes diminishes at a slower rate for English than for other la