Print

Print


>
> The thought had occurred to me to try machine translation, but as I've
> already invested many years, off and on, to the problem of parsing
> English sentences for chatbot programs (with very little to show for
> it; see "Artificial Intelligence" on my web page: http://fiziwig.com/
> ) I decided that for what I had in mind human translation would be
> best.
>

Makes sense.


> Setting aside the obvious difference in machine code vs. interpreted
> P-code, a lot of those differences might simply be due to choices made
> as to register allocation, parameter passing methods, manner in which
> tasks are factored into smaller functions and subroutines, etc.
>

Yes.  Don't forget also that compilers can produce assembly optimized for
all sorts of different goals.  Speed, size and readability are all goals a
compiler could optimize for which would substantially change the resulting
code.  I would imagine that similar tradeoffs would probably be made in
this hypothetical language we are discussing.


>
> Since the target "assembly language" (and that was probably a poor
> choice of name) is geared more toward the semantic content of the
> original it would seem that no matter how the concept of "Person A
> giving something to person B" is expressed in any given natlang, the
> "primitive" operations "gives" and "accepts' are still going to have
> to be present in the "compiled" version since they represent the core
> meaning of the utterance, and there is no alternative operation code
> for affecting this transfer.
>

Yes.  To your earlier point about passing arguments etc, you'd still get
some variation in how the arguments were set up.

Take the slightly more complicated example: "John gave the book of poetry
and the book of short stories to Mary."  It might compile to something more
like:

Person W // declare variable "W" to be of type "Person"
Person X // declare variable "X" to be of type "Person"
Book Y1 // declare variable "Y1" to be of the type "Book"
Book Y2
Subject Z1 // declare variable "Z1" to be of the type "Subject"
Subject Z2
AndList L // Just superficially thinking about data structures, I'd imagine
you'd want separate types for Lists connected by And and Or, both of which
inherit from class List
W name_is "Mary"
X name_is "John"
Z1 name_is "poetry"
Z2 name_is "short stories"
Y1 is_about Z1
Y2 is_about Z2
L contains Y1
L contains Y2
X gave L
W accepted L

Alternatively, we could parse the arguments differently, not combining into
a list and have two "gave" and "accepted" statements "X gave Y1" and "X
gave Y2".  This might correspond more naturally in a human's mind to "John
gave the book of poetry to Mark and John gave the book of short stories to
Mary."  But both are conveying the same information.  If we start doing
recursive sentences ("John gave the cat that gave him the mouse to Mary") I
imagine we're going to run into more options (maybe not, I haven't thought
that one through).  So it seems to me as though the answer to your question
as to whether one compiled form for any given concept is "no", for
non-trivial concepts.

>
> The two sentences "John gave the book to Mary." and "Mary accepted the
> book from John." might compile to identical semantic sequences with
> the possible exception that one additional statement might be present
> to indicate the focus of attention is on John's action in one case and
> on Mary's action in the other.
>

Yes.  Agreed.


> Granted, but these are surface syntax differences in the source
> languages. In the end, the processor still needs to put "a" into a
> register, add "b" to it and then place the result into "c". If the
> machine were a stack machine then scheme would already be closer to
> machine code and C++ would have to be translated to RPN. If the
> machine were a register machine then it would be the scheme source
> code that would need more translation.
>

Yes.  Addition works out roughly the same.  The point I was trying to make
was that not all differences were this simple.  For example, consider a
program to calculate factorials.  In Python:

(I hope the e-mail doesn't mangle my spacing here, but it probable will...)

def fact(n):
    sum = 1
    for i in range(n):
        sum *= i

In scheme:

(define (fact n)
   (if (= n 1)
       1
       (* n (fact (- n 1))))

Writing factorial recursively in python or iteratively in scheme is doable,
but weird, and these two programs do not compile down to the same result,
despite doing the same thing.


> Of course. But in the process of "compiling" the grammar and syntax of
> the original language is lost. It has to be, because the more
> primitive code is meant to be independent of any natlang grammar or
> syntax. The purpose is not to encode the grammar and syntax of the
> source, but to obliterate it. Pentium machine code does not preserve
> the syntax of C++, FORTRAN, or (thank goodness) COBOL. It obliterates
> it in order to tease out the deeper meaning encoded in that syntax.
>

Yes.  But my point is that different languages express things differently.
 I'm way out of my depth here, since I only speak indo-european languages,
but wouldn't a highly synthetic language express pretty much anything very
differently than an isolating one?  And wouldn't most compilers produce a
final result that was a more synthetic form of assembly or a more isolating
form of assembly depending on the source?  That would depend on things like
the relationships in the type system.

That makes another issue occur to me, which is that normal assembly doesn't
have types.  It's got registers and memory locations.  It seems like you'd
need roughly a type to correspond to each noun in a source language, and I
don't if nouns could be distilled down to an essential list as easily as
verbs could.  Maybe I'm wrong on that.  I'd be interested to see the
thoughts from people who know more linguistics than I do.  But generally
speaking, higher level languages have their own set of types, and that's
one of the things assembly distills out.  But that would mean that a simple
sentence like above would have to begin by describing what a "person" and a
"book" were.


>
> Syntactical and grammatical features do not need to be incorporated.
> They need to be abstracted out of existence in the final compiled
> result.
>
> Imagine this thought experiment. We put two native Chinese speakers
> together in a room. We give person A a book and tell him, in Chinese,
> give this book to the other person. Then we watch what happens.
>
> Next, we put two native Icelandic speakers together in a room. We give
> person A a book and tell him, in Icelandic, give this book to the
> other person. Then we watch what happens.
>
> My contention is that essentially the same thing will happen in both
> rooms, regardless of the form in which the instructions were given.
> Active or passive? Isolating? Imperative command or oblique
> suggestion? Polite register? Verb tense? Presence or absence of
> articles? None of that matters. It's all irrelevant to the final
> action that takes place in the room, and it's that final action that
> needs to be encoded in the "assembly language".
> ...
>

Yes.  Although many of those things do convey information.  Isn't it
possible that in a language with lots of politeness systems depending on
how it was phrased the person might comply with the request, or might get
insulted and refuse to?


> However, if the "decompiling" were based on using the semantic
> information from the "assembler" code to stuff appropriate words into
> existing sentence templates collected from human-produced examples of
> the target language then the decompiled version would look exactly
> like what a human native speaker would say. In fact, building good
> sentences in any natlang, given a sufficient library of canned
> sentence templates, is trivially simple.
>
> This is not to say that the computer program could or would duplicate
> every possible English (for example) utterance, only that every
> utterance it did generate would sound exactly like an utterance
> generated by a human native speaker.


Fair point.  I imagine that this would be possible.  Would be interesting
to see what sorts of style the final product(s) would have though.

-Daniel