But if I understand some of the docs correctly, even perl itself doesn't really know what everything is, it guesses based on heuristics
But it doesn't guess! It knows precisely. And for you to properly parse Perl, you must emulate those precisely.

There's very little statistical guessing in /usr/bin/perl. About the only two things is the hash-element vs. scalar-followed-by-char-class thingy in a regex, and the "is this a block or a hashref" in certain places that might have either. Everything else is deterministic. Your code must do it right, or it's not parsing Perl as perl would.

Put another way, I know that since sin takes an argument, that if I use slash following it, it's a regex-start. It's never a divide. I can tell that without running it or debugging it. And if I put a double-less-than, it's a here doc. It's never a left shift. But if I replace sin with time, the exact opposite choices are taken.

You cannot guess. To parse Perl, you must know at all times whether you are in a place expecting a value or a place expecting an operator. And to do that, you have to know the prototype of all the built-ins, and how to get the prototype of all the user-defined functions. Which also means you have to step along with the code, executing all the BEGIN blocks, including those spelled u-s-e.

This is not a simple task. Larry admits it. Damian was going to spend the better part of this year working on Parse::Perl as a YAS-funded project. If you are taking it on, but not aware of the things I've posted in this thread, it's a bit like saying "I can fly that plane", but just getting in, without realizing there are clouds and bad weather and other planes, and that landing can be a real pain sometimes, and what happens when the engines go out.

-- Randal L. Schwartz, Perl hacker


In reply to But perl is not guessing! by merlyn
in thread Appropriate CPAN namespace for perl parser by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.