Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: On Parsing Perl (Once upon a time)

by Anonymous Monk
on Jul 11, 2022 at 11:27 UTC ( [id://11145418]=note: print w/replies, xml ) Need Help??


in reply to On Parsing Perl

I'm currently working on something, (basically perl parser), and apart from the BEGIN block, everything seems parseable (is that the right word?) using some simple LL grammars. Or am I just too uneducated?

Replies are listed 'Best First'.
Re^2: On Parsing Perl (Once upon a time)
by haukex (Archbishop) on Jul 11, 2022 at 11:42 UTC
    everything seems parseable (is that the right word?) using some simple LL grammars.

    No, only perl (the interpreter) can parse all of Perl (the language). See my node here for details.

    Edit: added emphasis.

Re^2: On Parsing Perl (Once upon a time)
by LanX (Saint) on Jul 11, 2022 at 11:58 UTC
    Static parsing is only reliable, if you rule out or control all imported subs, because prototypes change the way Perl is parsed. See HaukeX's other reply.

    Basically changes at compile time ( see BEGIN blocks ) can change the parser.

    Dynamic parsing is possible though, if you inspect the op-tree after compilation, that's the basic idea of some newer tools, like the perlnavigator.

    See also perl -c in perlrun or B::Xref

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Static parsing is only reliable, if you rule out or control all imported subs, because prototypes change the way Perl is parsed.

      I think, though, that prototypes aren't the only reason Perl isn't statically parseable. There are quite a few heuristics that the parser uses that aren't all too well documented, and I'm not sure if a static parser would be able to reimplement all of them. And then there is no strict code, which I think gets even trickier. At some point I was considering researching and making a list of all of the reasons, but I unfortunately never got around to it.

        > There are quite a few heuristics that the parser uses that aren't all too well documented,

        I agree, it's messy. But that's a matter of research.

        For instance I was bitten when operator overloading of < and > became unreliable, because the parser thought that < is the start of a <> iterator.

        I also remember a time where this was legal syntax

        for qw/a b c/ { }

        and BTW this is a list multiplication, even without parens around

        qw/a b c/ x 3

        > At some point I was considering researching and making a list of all of the reasons, but I unfortunately never got around to it.

        I think this could be solved by an automatic approach training an external parser against automatically created code and snippets harvested from CPAN.

        The test would pass if (a patched) B::Deparse created the same syntax tree after compilation.

        Many people already invested time in parsers ( I could name at least half a dozen projects), those tests could be used too.

        As a side product, P5P would get a proper test suite too.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11145418]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-25 15:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found