Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a Perl script to match only Perl's keywords, the ones found in the Perl_keyword() function in the lex file toke.c of the interpreter source code.

I'm only a beginner in Perl so my problem is how do I match all the real keywords while excluding keyword-like words inside strings which Perl defines as
"x", 'x', `x`, /x/, m/x/, s/x/x/, y/x/x/, tr/x/x/, q(x), qq(x), qx(x), etc... there are quite a few ways listed in the comments of the scan_str() function used in the lexer.
And also excluding keywords inside comments.

I find that also I can't just simply match strings and exclude them from my searches because matching strings is a complex task in itself since for the ending quote or whatever symbol, i have to check if it's been escaped. And that's only the case if the number of backslashes before it is odd as in "ab c \\\" "

I'd really appreciate any help anyone can give. Thanks.

Replies are listed 'Best First'.
Re: Matching Perl Keywords
by tachyon (Chancellor) on Jan 18, 2003 at 02:10 UTC

    It is often said that "only Perl can parse Perl". The reason that this is said, and it is true, is that the ecclectic and rich synstax of Perl makes it impossible to generate simple and robust parsing solutions.

    The bottom line is quite simple. Without completely parsing a perl script you can't match keywords reliably. The best source code you will find (in pure Perl), that probably does what you want is perltidy. Perltidy is reasonably robust and mature. It has syntax highlighting (if that was what you had in mind) available and obviously a reasonably robust parsing engine.

    Just for laughs have a look at this code and pick the comments. Bonus points for a correct answer on snippet 2

    print <<'drom"edary',<<"f#f",<<'=pod',<< ''; #comment Just another Perl Hacker drom"edary #foo f#f #bar =pod #foo #too #you # is there a comment in the snippet of code below or not??? Blame Abig +ail 0.0 BEGIN { if ($ARGV [0]) {eval 'sub foo () {print}'} else {eval 'sub foo ($) {print}'} } $_ = "Just another Perl Hacker\n"; foo /#/; 1;

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Matching Perl Keywords
by TheDamian (Vicar) on Jan 18, 2003 at 17:50 UTC
    You might also want to look at the source of the Text::Balanced module. You may be able to adapt it to your needs.

    The source code of Filter::Simple would also be instructive, since it has to provide a similar "ignore the stuff in quotelikes" functionality. I guess it's no surprise that it uses Text::Balanced to do that.