Re: On Parsing Perl (Once upon a time)

Replies are listed 'Best First'.
Re^2: On Parsing Perl (Once upon a time) by haukex (Archbishop) on Jul 11, 2022 at 11:42 UTC
everything seems parseable (is that the right word?) using some simple LL grammars. No, only `perl` (the interpreter) can parse all of Perl (the language). See my node here for details. Edit: added emphasis.	[reply] [d/l]
Re^2: On Parsing Perl (Once upon a time) by LanX (Saint) on Jul 11, 2022 at 11:58 UTC
Static parsing is only reliable, if you rule out or control all imported subs, because prototypes change the way Perl is parsed. See HaukeX's other reply. Basically changes at compile time ( see `BEGIN blocks` ) can change the parser. Dynamic parsing is possible though, if you inspect the op-tree after compilation, that's the basic idea of some newer tools, like the perlnavigator. See also `perl -c` in `perlrun` or B::Xref Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^3: On Parsing Perl (Once upon a time) by haukex (Archbishop) on Jul 11, 2022 at 14:06 UTC
Static parsing is only reliable, if you rule out or control all imported subs, because prototypes change the way Perl is parsed. I think, though, that prototypes aren't the only reason Perl isn't statically parseable. There are quite a few heuristics that the parser uses that aren't all too well documented, and I'm not sure if a static parser would be able to reimplement all of them. And then there is `no strict` code, which I think gets even trickier. At some point I was considering researching and making a list of all of the reasons, but I unfortunately never got around to it.	[reply] [d/l]
Re^4: On Parsing Perl (Once upon a time) by LanX (Saint) on Jul 11, 2022 at 15:05 UTC
> There are quite a few heuristics that the parser uses that aren't all too well documented, I agree, it's messy. But that's a matter of research. For instance I was bitten when operator overloading of `<` and `>` became unreliable, because the parser thought that `<` is the start of a `<>` iterator. I also remember a time where this was legal syntax `for qw/a b c/ { }` [download] and BTW this is a list multiplication, even without parens around `qw/a b c/ x 3` > At some point I was considering researching and making a list of all of the reasons, but I unfortunately never got around to it. I think this could be solved by an automatic approach training an external parser against automatically created code and snippets harvested from CPAN. The test would pass if (a patched) B::Deparse created the same syntax tree after compilation. Many people already invested time in parsers ( I could name at least half a dozen projects), those tests could be used too. As a side product, P5P would get a proper test suite too. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]


Perl: the Markov chain saw
	PerlMonks