On 19 December 2012 18:53, Gary Shannon <[log in to unmask]> wrote:

This is a little bit off the current topic, but I just want to say, I
love context network approach. I have been so drowned in
sentence-level NLP recently that a real attempt at attacking
discourse-level processing is very refreshing to read.

> The intent here is very different.
> I DO NOT want to extract the existing grammar from the corpus, but to
> find ways to paraphrase the existing corpus in such a way that the
> grammar extracted from that corpus is minimal.

I'm pretty sure I got that. But the way you propose to go about it
seems incredibly tedious and highly amenable to machine assistance.

> Treebanks are useless because the point it to INVENT a grammar for a conlang.

They're useless if you do all of the inspection and paraphrasing
manually. They're useless for the creative bit where you introduce new
rules that are not necessarily derived from the source language. But I
would not want to manually look at the same kinds of sentences over
and over again, reconfirming to myself each time that, yes, I can in
fact still paraphrase this kind of source sentence into this other
kind of target/conlang sentence. I'd want to make that decision once
and then let a computer scan through the rest until it came upon a new
kind of sentence which might be already paraphrasable in a way that
the machine doesn't know (because it doesn't parse for semantics) or
might require introducing a new rule in the conlang grammar. Doing
that scanning requires that the text be parsed, which you'd have to do
implicitly in your head in order to do everything by hand, and thus a
treebank would be useful to avoid parsing on the fly.
And at that point, the process looks much like automatically
extracting a grammar from a treebank, except that instead of just
storing the rules that are found directly, it presents each rule to a
human and then stores whatever new rule the human puts in or nothing,
if the human decides that a rule is unnecessary because it can be
paraphrased. That should produce the same result as paraphrasing the
whole corpus manually, but with a small fraction of the effort. Plus,
you'd end up with a nice syntactic transfer system for translating
from the "training" source language to the conlang for free.