The recurrence equation is a(n) = 2*a(n-2) + a(n-3) + 1. It's in the literature as A052952. I take "numerology" to suggest a pseudo-relationship based on conjecture from a small sample. That is, the first few numbers may match a series, but how do I know later numbers will actually match the series? I'll present a proof below that the parse counts generated from the Minus Sign Grammar are precisely A052952, and I hope that will take us out of the realm of numerology.

First, a comment on why this is useful and/or interesting. Many monks will have heard of the Ackermann function. I don't know of any direct applications for it, but it gets a lot of use in testing because 1) it's an extreme example of a kind of computation that is practical; and 2) it focuses on a functionality that is useful.

In investigating parsing of ambiguous grammars, I wanted an extreme example of the kind of grammar likely to be encountered in practice. "In practice" and "Perl" aren't quite synonyms, but they're close. My problem with Perl grammar was that it's (of course) unambiguous. So I eliminated all notions of precedence. Also, I did away with lvalue restrictions. That is, I legalized 6++, treating it as roughly equivalent to ($tmpNN=6)++.

Honing in on the part of Perl most likely to generate lots of ambiguity, I focused on expressions with a number at each end and minus signs between them. Four operators are composed entirely of minus signs: pre-decrement, post-decrement, arithmetic negation and subtraction. The resulting grammar is

E ::= Number | E-- | --E | -E | E-E

In testing an ambigious parser, you want to be sure it gets all the parses. Counting by hand gets you only so far. Call the parse count for a string with n minus signs, W(n). At n=5, W(n)=8 and the counting is already quite difficult to do in your head:

<p>The parentheses are not part of the grammar, but added to show the parses. (((6--)--)-1)==5 ((6--)-(--1))==6 (6-(--(--1)))==7 (6-(--(-(-1))))==6 ((6--)-(-(-1)))==5 (6-(-(-(--1))))==6 (6-(-(-(-(-1)))))==5 (6-(-(--(-1))))==4

W(10)=144, W(20)=17711 and W(36)=39088169 so even for small n, we're in territory where hand counts just ain't gonna happen. Hence the importance of identifying the W(n) series with a mathematical series. I can simply count the parses generated by my algorithm and look up the appropriate number in the series to see if they match.

Yes, you say, all very nice, but since you only hand checked up to 5, isn't this numerology? What reason is there to believe that the number of parses generated, W(n), actually matches A052952?

A proof follows. Those who think they might have any taste for this sort of thing might want to at least attempt to follow it, since it's not what the mathematicians call "technical" and is presented in Perlish terms.

The first three numbers in A052952 are defined as 1, 1, and 3. Hand counts will establish that they match W(1), W(2) and W(3).

Length 1, Parse 0, value==(6-1)==5 Length 2, Parse 0, value==(6-(-1))==7 Length 3, Parse 0, value==((6--)-1)==5 Length 3, Parse 1, value==(6-(--1))==6 Length 3, Parse 2, value==(6-(-(-1)))==5

It remains to show that W(n) = A052952(n) for n>2. (I was hoping you'd forget. :-) )

Look at the far right hand side of the sequence of minus signs. It's one of three things: a binary minus (subtraction), a unary minus (negation) or the second character of a pre-decrement. We can get a count of each of these three categories and add them to produce W(n).

The unary minus category can be formed from any of the parses with one less minus sign, by adding a negation to the right side. So there are W(n-1) parses from that source.

Similarly the binary minus category is all the parses with two fewer minus signs, with a pre-decrement added to the far right end. Therefore, W(n-2) parses from that source.

The subtraction category is only slightly harder. The only operator pattern allowed to the left of the binary subtraction is a series of post-decrements. So if n-1 is even, there is one parse from that source. If n-1 is odd, the pattern of post-decrements cannot be formed, and there are no parses from that source. In Perlish notation, the number of parses is (n-1) % 2 ? 0 : 1.

Putting all the sources of parses together, we get

W(n) = W(n-1) + W(n-2) + ((n-1) % 2 ? 0 : 1)

We can use this formula to derive a count for one of its own components, W(n-1):

W(n-1) = W(n-2) + W(n-3) + ((n-2) % 2 ? 0 : 1)

And then we can substitute in the first formula from the second:

W(n) = W(n-2) + W(n-3) + ((n-2) % 2 ? 0 : 1)  + W(n-2) + ((n-1) % 2 ? 0 : 1)

Simplifying:

W(n) = 2*W(n-2) + W(n-3) + ((n-2) % 2 ? 0 : 1)  + ((n-1) % 2 ? 0 : 1)

Finally, note that of two consecutive numbers one will always be odd and the other always even. Therefore ((n-2) % 2 ? 0 : 1)  + ((n-1) % 2 ? 0 : 1) will always be 1. Applying the final simplification:

W(n) = 2*W(n-2) + W(n-3) + 1

From http://www.research.att.com/~njas/sequences/A052952, the recurrence for A052952 is a(n) = 2*a(n-2) + a(n-3) + 1. Exactly the same as the recurrence I've just proved for the Wall series. QED and all that.


In reply to Re^2: The (Larry) Wall Series by Jeffrey Kegler
in thread The (Larry) Wall Series by Jeffrey Kegler

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.