On Wed, 3 Jun 2015 16:34:57 -0700, David Peterson <[log in to unmask]> wrote:

>(3) It doesn’t matter if, for example, a language has [θ]. It matters how it got it. For example, Mexican Spanish is like most languages: it lacks [θ]. Castellano, however, does have [θ]—same language. It’s exceedingly rare, crosslinguistically, but it doesn’t really matter, because we know exactly how it got [θ]. In this case, neither fact is particularly interesting. Looking at the percentages crosslinguistically rather than the diachrony is a mistake, because you’re examining the end result, not the process. In the case of Spanish, *ts > [θ] in Castellano and *ts > [s] in Mexican Spanish. How common are each of those specific sound changes? Now you’re asking the right question.

Sure.  As D'Arcy Thompson said, everything is the way it is because it got that way.  I think I could reconcile my thesis with yours -- if a group of features shows correlations, which my strong naturalism would bid you account for, then there oughtta be some sort of causation (I know, I know).  Either one feature affects the other, or some third feature historically brings about both.  And looking at history is a good way to detect these sorts of correlations.  To put it another way, of the correlations of features that natlangs show, many of them are just the ones that make internal reconstruction yield sensible results.

On Thu, 4 Jun 2015 00:43:05 +0100, And Rosta <[log in to unmask]> wrote:

>On 4 Jun 2015 00:09, "Alex Fink" <[log in to unmask]> wrote:
>> [I define my "strong naturalism"]
>Ah, this kind of explains why you want to generate conlangs automatically.

Kind of.  They are related impulses.  I don't expect my generator will be able to tell me what the probabilities should be, though; instead, I'm having to provide a lot of probabilities as explicit parameters.  Maybe, if my historical evolution is sufficiently good (if I ever write it), the program will be of use in that respect, allowing experiments about which sorts of features it thinks cluster together or what underlying structure supports a given surface feature.

>Can a solitary conlanger's solitary conlang exhibit strong naturalism?

Can one do any meaningful statistics with a sample size of n = 1?  Not much.  Certainly on a single question like "uses [T]" there can be no smoking gun.  And you could pick out a feature rarer than that (maybe your language has three contrastive degrees of vowel nasality, which fewer than one in a thousand do) but if you pay special attention to the outcome you're probably succumbing to the problem of multiple comparisons.  

So one way to make the question meaningful for a solitary conlang would be to increase the sample size by sampling multiple _features_.  But you need some way to consider these disparate features as performances of the same experiment.  You could for instance do that by looking at how often the conlang's values of the features agree with the constructor's native language -- this is the scale on which relexes score low -- or how often they agree with the modal value -- this detects something like kitchen-sinkiness / overdoing it on novelty.  

>Returning to your statistical point, are there natlang features that are
>significantly underrepresented in conlangs?

I thought once that I should be keeping lists of these things, but then I learned of the WALS vs. CALS survey which quelled the impulse to do it properly myself, so this'll be a fairly off-the-top-of-my-head list.  It's also hard to know how to consider features that are underrepresented 'cause people typically don't think about them at all, rather than thinking about them and making another choice.  

Reduplication has been my go-to answer, but I see that's already come up.  Cryptotypy in general, ditto.  The isolating end of the morphosyntactic spectrum is underpopulated, prolly in part 'cause syntax is harder than morphology.  Tone, in any manifestation.  Morphophonology, especially when not just patternless on a morpheme-by-morpheme basis.  Syncretism.  Suppletion.  Free variation in general.  Clitics.  Double-marking.  Asymmetric negation.  Diversity of types of bi-clausal structures, including some of narrow application.  Morphosyntax dedicated to participant tracking on the larger-than-one-sentence scale, and other features at this scale.  Sociolinguistic-type features with thoroughgoing grammatical ramifications, like inflectional politeness or systematic vocabulary replacement by register.  Drastic language contact effects (e.g. at the Sino-Xenic level).  Properly diffuse polysemy (in the word classes that tend to have it).  Number systems that aren't just straight-up base N for some N.  Native orthographies with deviation from one-to-one sound-to-spelling correspondences, especially multigraphs and underspecification.  

>Are there overrepresented features that everyone isn't already quite
>conscious of? The [T] one is well-known enough that, say, one would
>naturally include it in a parody artlang.

Hm, these is harder to think of offhand.  Alongside [T] I guess one could name other characteristically English phones, like [&] or [I].  I suppose the overrepresentation of "correlatives" after the Esperanto style is well known.  The verb "have", that one's been mentioned.  Relative pronouns.  Manner adverbs.  Morphological comparative systems with symmetric "more than" and "less than" degrees, or for that matter adjectives inflecting for degree of comparison in general.  

There are some in the WALS vs. CALS thread that didn't occur to me.  For instance, apparently the construction of the noun phrase shows systemic deviations: patterns like numeral-noun, adjective-noun, noun-genitive are overrepresented.

On Thu, 4 Jun 2015 12:32:59 -0700, Jeffrey Brown <[log in to unmask]> wrote:

>More replies:
>Firstly, there are two ways
>to calculate these statistics. One is by number of speakers, and the second
>is by number of languages. The first one obviously is no good because the
>number of speakers is based too much on economic or military dominance by a
>handful of countries to be meaningful in a linguistic sense. 

Of course.  

>The second one
>is better, but still suffers too much from the accidents of history rather
>than other, deeper causes. [...] It is not possible to disentangle the
>linguistic, or neuroanatomical, reasons for linguistic statistics from the
>historical reasons. 

Hm, I think that most explanations of linguistic features which I would want to call reasons _are_ historical, so I'm not sure how to contrast that with the "linguistic" class.  Everything is the way it is because it got that way.  Maybe there's an exception to be made for features reflecting hard limits on human cognition -- but even here there are more soft limits than hard limits, and so even these biasses would be felt through gradual diachronic change.  E.g. a construction which creates too many deeply center-embedded sentences wouldn't drop out entirely all at once; even if there came to be a discrete constraint against using it twice in a sentence (say) this constraint would take time to come to fixation in the speaker community; etc.

Oh, wait, maybe by "history" you meant basically _political_ history, of which speech community influenced which.  In that case, yes, you're right, that is a confound.  But I would note that influence by foreign cultures, including of the uninvited sort, is part of the human condition, and this has its effect in language, in the whole spectrum of language contact effects.  If you dismiss all language contact effects under the rubric of "accidents of history" you've strayed from strong naturalism.

In any case, I'm aware of the difficulty of measuring my statistics without confounds.  That was the reason for my proviso in my first post that you might need to watch the world for a million years to have enough accuracy.  But surely the solution is not to _coarsen_ your point of view:

>That is why I have concluded the best one can do is a
>simpler procedure: Does this particular feature exist in any language? Yes
>or no? 

This doesn't avoid the problem, it just concentrates it.  A very rare feature might well be erased from the world through exactly the kind of accident you're thinking of.  (Non-paralinguistic clicks e.g. are such a fragile feature on earth: outside Damin, which doesn't count, they're only found in one language area.  If Bantu had overrun the Khoisan territory more thoroughly, and not had a need arising from its avoidance speech for phonological distortion methods, there might be no click languages today.)  In my perspective this means you might write down 0% where you should have written 1% for the probability of this feature.  On another feature the same situation might bring you to take 50% where you should've had 51%, an equally small absolute error.  Your perspective is unfazed by the latter, yes, but at the cost of being utterly, night-and-day-wise turned on its head by the former.

Maybe more to the point, if one's reason for asking whether one's conlang is naturalistic is as a sort of reassurance against one's own inward doubts, then a method which will readily yield an unconditional "yes" is a handy thing.  But if we're actually trying to exercise discernment, if we think it's meaningful to say that one conlang is _more_ naturalistic than another, that doesn't help.  I for one do think naturalism should be treated as a fuzzy property.

On Thu, 4 Jun 2015 21:10:51 +0100, And Rosta <[log in to unmask]> wrote:

>Off the top of my head, I'd suggest a test of naturalism might be the extent to which someone who doesn't know whether it's a conlang or not will bet that it is. A kind of conlang analogue of the Turing test.
>Say, assemble a panel of judges, and for each of a range of languages, nat- and con-, get them to state a degree of confidence for whether it is nat- or con-. One set of tests could be presenting the judges with raw texts, another set could involve glossed texts, and another could involve descriptions of the conlangs. One of the judges could be an Alex Fink conlang-sniffer computer programme. That might be an interesting exercise to do in actuality, not just as a thought-experiment...

Ooh, seconded.  (Though even in the year 2050, when I've finished an alpha of my conlang generator, it'd still probably be decades more work to turn it into a conlang sniffer!)

Alex, long past his bedtime