On 19 December 2012 18:53, Gary Shannon <[log in to unmask]> wrote: [...] > http://fiziwig.com/ai/chatbot/network01.html This is a little bit off the current topic, but I just want to say, I love context network approach. I have been so drowned in sentence-level NLP recently that a real attempt at attacking discourse-level processing is very refreshing to read. > The intent here is very different. > > I DO NOT want to extract the existing grammar from the corpus, but to > find ways to paraphrase the existing corpus in such a way that the > grammar extracted from that corpus is minimal. I'm pretty sure I got that. But the way you propose to go about it seems incredibly tedious and highly amenable to machine assistance. > Treebanks are useless because the point it to INVENT a grammar for a conlang. They're useless if you do all of the inspection and paraphrasing manually. They're useless for the creative bit where you introduce new rules that are not necessarily derived from the source language. But I would not want to manually look at the same kinds of sentences over and over again, reconfirming to myself each time that, yes, I can in fact still paraphrase this kind of source sentence into this other kind of target/conlang sentence. I'd want to make that decision once and then let a computer scan through the rest until it came upon a new kind of sentence which might be already paraphrasable in a way that the machine doesn't know (because it doesn't parse for semantics) or might require introducing a new rule in the conlang grammar. Doing that scanning requires that the text be parsed, which you'd have to do implicitly in your head in order to do everything by hand, and thus a treebank would be useful to avoid parsing on the fly. And at that point, the process looks much like automatically extracting a grammar from a treebank, except that instead of just storing the rules that are found directly, it presents each rule to a human and then stores whatever new rule the human puts in or nothing, if the human decides that a rule is unnecessary because it can be paraphrased. That should produce the same result as paraphrasing the whole corpus manually, but with a small fraction of the effort. Plus, you'd end up with a nice syntactic transfer system for translating from the "training" source language to the conlang for free. -l.