On Sat, 14 Nov 2015 17:48:11 +0000, And Rosta <[log in to unmask]> wrote [on the old thread]: >I haven't had time to investigate those instructions, but I wonder if I >might describe the generator I had been about to work on with Alex, before >the rest of our lives supervened. (This was for Jundian, an artlang I'd >begun work on.) > >Each combinatorial element is manually assigned a (phonological) complexity >value, and the complexity values are summed (by the combinatorial component >of the generator) to give an entire form's complexity value. When I want a >word, the generator picks one randomly following a probability curve whose >axes are probability and complexity and whose peak and higher inflection >point are complexity values specified by me. (This anyway is my layman's >understanding of what I want.) In other words, I specify a value that is >the most likely complexity of the word and another value that the word's >complexity very probably won't exceed. > >Could the generator incorporate this? It seems to me that it requires the >ability to specify complexity values in the input, plus the simple >complexity summing, plus the complicated bit that generates words in a >fancily probabilistic way. A couple recent thoughts. (0) The part of And's spec that had troubled me the most, in terms of it not being clear straightaway what the Correct thing to do was, was how to fuzz the distributions, i.e. how to do what And specifies above in terms of peak and inflection point. Well, probably we won't know how to do this in a nature-approximating way until the Gusein and Zade of this research area come along. But implementationally, it's occurred to me that all the fuzzing can be built into the word grammar, so the program doesn't have to do it separately. For instance, to take a simple example, if one wants to be able to ask for a word of given maximum complexity but accept a word of up to say two complexity points less, with no other biassing, one can do this by including a new terminal in the word grammar, appearing exactly once in each word, which expands equiprobably to segmental zero with complexity zero, segmental zero with complexity one, and segmental zero with complexity two. Giving a different probability distribution there clearly lets you make the fuzzing fall of at different rates. Or if you wanted the amount of fuzz to increase proportionally to the word length: no problem, don't just make this terminal appear once in each word, but let there be a copy of it as sister to every terminal with segmental substance. And so on. This works cleanly in And's case of discrete complexity scores. To do the Correct thing in Jim's case of basically-continuous complexity scores you'd need to be able to specify non-discrete distributions. (1) This is more of a theoretical unification. Take an extant word generator which allows specification of probability distributions, and replace the real numbers in which the probabilities live with the univariate polynomial ring R[t], where the "probability" p*t^n is to be understood as meaning that this option has probability p and incurs complexity score n. The effect of this replacement is basically to piggyback the complexity-score computations on top of the probability computations that the word generator is implicitly already doing; the complexity scores will behave the right way via this piggybacking. (This has an indiscrete analogue too: replace polynomials with distributions on the real line and product with convolution.) I don't imagine this idea would be plug-and-play into any of the extant word generators: even if you'd had the inexplicable foresight to let probabilities be of a generic data-type, you'd still need some special stuff to generate words of a complexity the user selects (namely, some kind of coefficient extraction). But a tool that can explicitly do the probabilistic analogue of what everyword.pl does is very nearly there. Alex