Perl is impossible to parse in the same sense that it's impossible to determine whether an arbitrary program halts.

Given the following Perl script:

#!/usr/bin/perl print("1\n");

It should certainly be possible to parse it (without executing it), and it should also be possible to detect that it will halt.

However, a parser cannot be written that will take any arbitrary valid Perl scripts as input, and always produce a parse tree as output without executing the program.

PPI can parse a very large subset of Perl scripts. It does so very well, but there will always be some scripts it simply can't decide. The canonical example is:

whatever / 25 ; # / ; die "this dies!";

Which can be parsed two very different ways depending on the prototype of whatever. If it has a prototype of () then it takes no arguments, so it's interpreted as the following, plus a comment:

whatever() / 25;

If whatever has a prototype of ($), so takes an argument, then it is interpreted as:

whatever($_ =~ m{ 25 ; # }); die "this dies!";

If the prototype of whatever is determined at runtime, e.g.:

BEGIN { *sum = sub ($$) { (shift) + (shift) }; *whatever = (sum(2,2) == 5) ? sub ($) {} : sub () {}; }

then the Perl cannot be parsed without executing part of it. (The parser needs to call the sub sum.)

Which is not to say that PPI and the other fine projects you mention are without value. Parsing a large subset of Perl is still very useful. Having a large subset of a fortune, is better than having no money at all.


In reply to Re^5: regex issue by tobyink
in thread regex issue by perlNewby

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.