Print

Print


Recently*, And Rosta — and much earlier, Jeffrey Hennning — described
conlangs as, at most, "model" languages.


To me, "model" here can mean two things:
a) a model *of* some reference, trying to approximate it, and/or
b) a not entirely functional miniature.

The first is, I think, quickly dismissed. With very few exceptions,
conlangs do not explicitly attempt to approximate some reference. The
closest I can think of are Basic English (which tries to whittle down
English to a minimum-viable subset), and a posteriori conlangs (which
have some quasi-diachronic relationship with one or more natlangs).

The real question is whether conlangs are fully functional, and
whether lacunae in the descriptions of conlangs make them somehow
worse than natlangs.


Let's suppose that you're an ordinary person learning English as a
second language.

Let's call all the learning materials normally accessible to you —
curricula, dictionaries, grammar dictionaries (of the ordinary sort),
K-12 English education, etc — the "documentation" of English.


First, let's admit that *any* description of *any* language which has
probe-able "speaker intuition" is incomplete.

In the millennias of time spent documenting English, we have still
never produced a *complete* description of even a single idiolect at a
single point in time. Read all the monographs you want, do all the
research you want, and you will never learn enough to fully model even
a single speaker's linguistic behavior.

Since the analogy holds anyway, let's stick with the more reasonable
sense of "documentation".


Let's call the gap between the documentation and speaker behavior
"undefined behavior".

Let's call the actual complete linguistic behavior of any single
person an "implementation" of that language.

Implementation variance is always present for any language with
multiple speakers — be they jointly-raised twins (with minor
idiolectal variances); native speakers with different dialects; second
language learners with different native languages, whose respective
native languages fill in the undefined behavior (to the extent they
have not formed some other consensus), etc.

The variance will be in the aforementioned undefined behavior, to the
extent that the speaker doesn't just reject the supposed documentation
as incorrect.

So there is formal agreement on some parts of the language (the
documentation); informal agreement on other parts (implementation
similarity), and disagreement on other parts (differently implemented
undefined behavior).


This is of course an analogy to computer languages.

(Yes, I agree that computer languages are not the same as human
languages, because they have limited domain. That they tend to be
simpler is irrelevant; give them time and they build up all sorts of
crazy weirdness. Just look at JavaScript, Perl, or PHP across
different implementations.)

Computer languages also have formal documentation, saying how the
thing *should* work. With few exceptions (e.g. possibly Haskell and
its ilk), they also have areas of undefined behavior.

Different implementations (compilers / interpreters) fill in those
gaps in different ways. Some have cross-implementation consensus, e.g.
internet RFCs that get adopted; some are intentionally extended in
some special snowflake way, e.g. anything made by Microsoft; and some
just don't think about it and do something that may or may not be
reliable (and is probably going to wind up in a vulnerability report
eventually).

Just like human speakers of languages — nat or con — implementations
will often give decisions on whether or not some given utterance is
"grammatical", i.e. whether they think it has a meaningful parsing.
Some interpreters/speakers are more lenient than others; some
(correctly, by their implementation/idiolect) will or won't have a
meaningful way to interpret the utterance; some just won't know.

(Yes, I am rejecting the notion that grammaticality in languages is a
boolean, or indeed that it is fully consistent even within one
speaker, let alone across more than one.)


The exact nature of these undefined behaviors (in a given situation,
for a given implementation) is of course interesting — to linguists,
for human languages, and to hackers, for computer languages. It can
tell you a lot about how the system *really* works, not just the
documented version. It also provides both with gainful employment. ;-)


By And's expressed view, a conlang is "model" for a few reasons.

First, conlangs have undefined behavior, when judged by their
documentation. But this true of any natlang as well — the difference
is only as a matter of degree, not of quality.

(If you're a Searlian, insert the standard rant here about poverty of
input, and the argument extends. Even if you had complete corpus
knowledge, which you don't and can't, it would not be adequate to
fully explain linguistic behavior.

If you're a Lakoffian, insert the standard rant here about embodied
cognition, and the argument extends the other way. It is impossible in
principle to document language completely, because substantial parts
of it are based on [non-uniform] human interaction with the physical
world, rather than a separate language faculty per se.)


Second, conlangs' undefined behavior is often implemented by a given
interpreter in ways that borrow heavily from the defined behavior or
consensus undefined behavior or that interpreter's more familiar (e.g.
native) languages. But this too is true of any natlang as well; second
language learners will generally apply what they already use to what
they're learning, unless overridden by clear documentation or adopted
consensus.

Indeed, some conlangers — notably Boudewijn Rempt, in his Apologia pro
imaginatione — would say that this "sub-creation" is a thing to be
honored.

True, natlangs have been more thoroughly documented — and have more
thorough inter-speaker consensus. Linguists and broadcasters alike
remain gainfully employed. This is not a universal, though; plenty of
languages (e.g. sign languages) have had far less investigation, or
less inter-speaker consensus, or borrow heavily from contact with some
more dominant language. Nevertheless, they are not dependent on the
grace of a field linguist's thesis, nor on complete differentiation
from other languages, for their legitimacy.

Even a "dead" language, like spoken Latin, has these properties.
Spoken Latin has changed over the last centuries despite no native
speaker population at all (with the possible exception of a handful of
dual-language clergy). Native speakers are not a requirement for
Latin, *in its current form*, being significantly different from the
last spoken native Latin (at least as much as inter-English
dialectical variance) and yet still a full language.

So it is again only a distinction of degree, not of kind, to say that
conlangs are more "model" than natlangs because their speakers import
undefined behavior implementations from other languages.


And describes UNLWS as one of the least model languages he knows —
essentially, because it deviates so drastically from extant languages
that it gives the appearance of having less undefined behavior.

This appearance is false. UNLWS has plenty of undefined behavior, both
because Alex & I suck at documenting our language, and because there
is (very intentionally) a great deal of it that we punt to resolution
via Grice… which is not reliable inter-speaker.

Both jointly and severally, we have intuitions about what has the
"UNLWS nature", even for completely novel utterances, morphosyntax,
lexemes, velc. Sometimes we agree, sometimes not; sometimes we have
very strong reactions (on par with any natlang speaker's reaction like
"this is obviously ungrammatical, and an abomination to boot"),
sometimes weak ones (like "this is weird but maybe-I-guess
grammatical").

The same, I would posit, is true of any conlanger. We all have some
intuitive sense about the languages we make and speak.

I don't think there's much if any difference between a conlanger's
intuition for aesthetic and a natlanger's intuition for "deep
grammar". Neither is fully documented, nor can they be. Both have
impact on linguistic behavior, including judgments of
"grammaticality". Both are a source of the deeply complex
"implementation" aspect of language.


In short, if Mandarin as learned by an native English speaker is a
full language, so is Toki Pona.

Either all languages are "model", or none are. It matters not whether
the language is natural or constructed, nor how much development has
gone into it, so long as there is anyone who can use it with any
degree of intuition about its unexplored areas.

Since the distinction is vacuous, let's just ditch the pejorative.

- Sai

* Note: this was composed 12 days ago, and does not reflect any posts
since then. I was hoping to improve it to an essay proper, but stalled
out, so decided to just post what I have.

P.S. For non-programmers in the audience, take a look at e.g. (in
order of difficulty)
https://en.wikipedia.org/wiki/Undefined_behavior
https://en.wikipedia.org/wiki/Unspecified_behavior#Implementation-defined_behavior
http://blog.regehr.org/archives/213 et seq
http://blog.regehr.org/archives/767
http://www.underhanded-c.org/
https://people.csail.mit.edu/nickolai/papers/wang-undef-2012-08-21.pdf

P.P.S. Thanks and blames to Alex for the conversation that prompted
this post, outlining it, and haranguing me to post it. :-p