On 7 August 2015 at 00:05, Patrik Austin <[log in to unmask]> wrote:
> Alright, there's definitely a solution there.
> I can work these with JFLAP (CYK parse):
> S -> aXb
> X -> SX | λ
> S -> aB
> B -> SB | b
> The other ones rejected my test string aabababaaababbabbb, namely these:
> S -> aB
> B -> Sb | b
> S -> Ab
> A -> aS | a
Oops! I goofed! I just tried parsing it manually and realized that
attempting to simplify the second production rule accidentally
eliminated the ability to produce an arbitrarily long sequence of
embedded S's. You can still flip the branching direction with
S -> Ab
A -> AS | a
>> I doubt there is any way to make this work with actually only one
> production rule. But I'll think on it a little more, just in case.
> Thanks, this is actually pretty amazing. I don't think I would've figured it out any time soon at all. If there's more where that came from, I'm definitely interested; anyway here's already what I'm going to need.
> This grammar shouldn't be as simple as the fundamental one, so it's all looking good.
There is a *little* more where that came from. :) My _Languages and
Machines_ book is sadly inaccessible right now, as I am in Idaho and
it is in Utah (if I had it with me, I would've just looked all this up
in one round, as parenthesization grammars are pretty standard fare),
but a bit of googling refreshed my memory on this formulation:
S -> SS | aSb | <empty>
That's the *correct* version of what I was trying for with S -> aSSb |
<empty>. This allows for sentences that consist of two other complete
sentences in hiatus without being embedded in a common surrounding a-b
phrase. If that's OK, great! Otherwise, I don't think there's any way
to remove that possibility without introducing a second symbol (X, A,
B). The <empty> production also allows for infinite ambiguity, as
previously noted. The ambiguity can be reduced by special-casing 'ab',
with nothing in between, as a terminal form:
S -> SS | aSb | ab
But that still produces branching ambiguity, since neighboring S's can
be expanded in either order. (And *this* time, I worked out a
production manually before hitting "send", to make sure it actually
Now, that gets everything onto one line, at least, which you might
count as one rule. Many other people, however, would still count it as
3, which have just been condensed into one line for convenience by a
notational shorthand. (This harks back to prior discussions on how you
count the number of rules in a language.) The larger count, however,
shows the similarity of this grammar to other grammars that produce
the same language, and this does a better job of telling you something
about the actual features of the language itself, rather than
incidental features of how you choose to represent the grammar. The
two-line grammars all had a | in one of the two rules, so that still
comes to 3 total production possibilities. It's kind of like finding
different basis sets for the same vector space- you can slice it up in
lots of different orthogonal ways, but every way you slice it, you
still need the same number of dimensions in the basis set.
Intuitively, the three different productions encode the following prose rules:
1. There can be simple sentences with nothing embedded in them
(necessary to allow finite sentences).
2. There can be complex sentences that have other stuff embedded in them.
3. Embedded stuff can be single sentences, or multiple separate
sentences in a row.
Even in prose, you can write those down in a bunch of different ways,
and combine sentences to make them look like one two rules, or even
one, but no matter how you slice it, all three possibilities have to
be in there.