Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've written a pure perl perl source code parser ( or at least lexer ). The purpose is to lex source, so that it can be syntax highlighted, converted to different forms, have code layouts applied to it, obfuscate it, or analyse it.

More details at http://ali.as/PSP/

My question is, where should it go on the namespace. For development purposes I'm use Perl:: which is obviously wrong.

Do you have any suggestions?

ADAMK
  • Comment on Appropriate CPAN namespace for perl parser

Replies are listed 'Best First'.
Re: Appropriate CPAN namespace for perl parser
by gmax (Abbot) on Feb 12, 2002 at 17:23 UTC
    [OT]
    I feel sort of responsible for this node. :)
    This morning I got a message about this issue in module-authors@perl.org, and I decided to do some PerlMonks advocacy. I contacted you and I advised you to join our site and ask about your trouble.
    Apparently you did (Welcome!), even though the request is not as detailed as I had hoped.

    How about some more insight on your project?
    And besides, if you log in as yourself, the general public will appreciate it.

    [/OT]

    Now on topic at least. I think that your module should go under Parser::, maybe as Parser::Perl or Parse::Universal. But I would need more details to give you better advice.

    good luck.

     _  _ _  _  
    (_|| | |(_|><
     _|   
    
Re: Appropriate CPAN namespace for perl parser
by tachyon (Chancellor) on Feb 12, 2002 at 17:38 UTC

    Wiser souls than I will counsel you that only perl can parse Perl. Once upon a time I embarked on a somewhat similar project. Here is some code that you may like to use in testing. Much of it came from insults and taunts from the on again off again monk Abigail

    print <<'drom"edary',<<"f#f",<<'=pod',<< ''; #comment Just another Perl Hacker drom"edary #foo f#f #bar =pod #foo #too #you $_ = "Just another Python# Hacker\n"; s {Python#} [Perl]; print; $_=<<'=pod'; Just another Perl Hacker =pod print; s[foo#] #comment [bar#]; #comment m{ \#[#] # comment \w+: # comment [#] # comment \# # comment \} # comment }x; # $_ = "Just another Perl Hacker\n"; { print if m\foo\#comment } $_ = "Just another Perl Hacker # No comment, no comment! # Yes, really! # I am really a Perl Hacker! ";print; (1) ? print "foo#\n" : print "bar#\n"; # foobar print "\"Goodnight I'm tired!\" # No comment here\n"; # but one here sleep 1; # Because tired print "Goodbye!\n"; BEGIN {if ($ARGV [0]) {eval 'sub foo () {print}'} else {eval 'sub foo ($) {print}'} } $_ = "Just another Perl Hacker\n"; foo /#/; 1;

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      I haven't listed it anywhere yet, but I anyone would like to have a play with a CGI frontend to the modules, you can at

      http://doug.idleplay.net/cgi-bin/CPAN/applications/psp/psp.pl

      As for the code above, it is mostly ok. The quote engine is written directly off the PODs on quote and quote-likes, and is quite accurate. The only one that gets it ( which is a fixable bug ) is

      print if m\foo\#comment
      The rest I feel should be ok. Try it for yourself. One quick note on that. There is data attached to the quotes as to where in the quote the actual data is, so comments are recognised as comments, but they will not appear that way in the syntax highlighting ( yet ). I have some requirements that Perl itself doesn't have, I can't just discard comments or Pod, so naturally the parsing process is a bit different.
Re: Appropriate CPAN namespace for perl parser
by jmcnamara (Monsignor) on Feb 12, 2002 at 17:14 UTC

    Parse::Perl:: might be appropriate. It would fit in with Parse::RecDescent and Parse::Yapp.

    It looks like an interesting project. Would you care to write a little bit about it.

    --
    John.

      I apologise for the lack of depth, as I don't have huge amounts of time on weekdays for posting.

      When I started this, I decided from the very beginning that I would not be able to write a "parser" capable to actually executing code. As I was told on more than one occasion, "that was lies madness". But executing code is not the only thing that get's done with code. Syntax highlighting, perltidy, source code analysis and a few other functions can all benefit from the ability to lex source properly, even if it can't execute it. The parser is written with that limit in mind. If you don't have to execute the code, you can be a bit more lax with your standards. As long as you parse the source code correctly in 99.9% of cases, you can get value from it.

      Second, to touch on the code structure, this is a char by char parser broken into two pieces. The first part Perl::Tokenizer:: goes over the code char by char and seperates the code into bits and returns a Perl::Document. The Perl::Lexer:: takes the document and builds the object tree, classifies blocks, and tags some extra information in about comments.

      This leaves you with a Perl::Lexer::Tree. From there, other modules, such as Perl::Transform::Tidy operate on trees. In the case of Tidy, it hunts down the tree and removes all whitespace, and then inserts replacement whitespace. Once the tree is flattenned back into a Perl::Document, and serialised into plain text again, the payout should come out the way you want it ( The Tidy code is currently not good enough by far ).

      Let me also note that this module is still considerred very experimental, although the syntax stuff works nicely. I expect to change module names, terminology, and possible method names before it is ready for CPAN.

      I'll try to attach further comments directly to the other comments.
(Ovid) Re: Appropriate CPAN namespace for perl parser
by Ovid (Cardinal) on Feb 12, 2002 at 18:00 UTC

    You've written a parser, so it should go in the Parse:: namespace. I believe that Parse::Perl is being worked on right now, but that does seem like an appropriate namespace.

    I have a couple of questions about your parser. Do you have tests for it? Do you have a distribution or documentation? I had a few things I wanted to throw at it which I was pretty sure would give me an idea of how robust it is, but all I could could find was a bunch of HTML representations of the code. It wouldn't be too hard to take those and strip out the line numbers, but it would be nice to have something to play with and give you feedback on.

    Incidentally, in your regex lib, I noticed that you appear to define a package as having the symbols separated by a double colon, when a single quote mark is also allowed (for Perl4 compatibility - see the D'oh module). You also have the Unix and Mac newlines reversed.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Thanks, I'll look into it.
Re: Appropriate CPAN namespace for perl parser
by merlyn (Sage) on Feb 12, 2002 at 18:16 UTC
      Sorry about the lack of reply Randal... I have a nasty work email situation, so I should get to it shortly. But to answer your question, handling / is currently the most "fuzzy" part. It's the one character I had real problems with.

      Currently, it's written to handle the most common cases only.

      http://ali.as/PSP/source/Perl/Tokenizer/Classes.html line 144 in the browsable source code is the relavent section.

      Since I don't have exposure to the relavent sections of the perl C source, it was fairly difficult, but I'm sure there's a method to use that covers the 99.9% standard.

      With the difficulties in overcoming POD, __END__ etc tags, quote parsing, and the rest mostly solved, I wouldn't want to cancel the whole thing just because of a single character :)
        But that's only one example. It's not just / (divide or regex). It's also dot (concatenate or decimal point), less-than (less than or filehandle read), two less-thans (left shift or here-doc), star (glob or multiply), percent (hash or modulus), ampersand (subroutine or bit-wise and), and question mark (regex or question-colon).

        If you aren't handling all of those, you aren't parsing Perl!

        Put another way, you cannot tokenize Perl without at all times knowing whether you are expecting a value or an operator, because all of the ones I just listed have double duty, depending on context. And yet, to know that, you also need to know if you have a prototyped function to the left that takes args or not. What a mess!

        -- Randal L. Schwartz, Perl hacker

Re: Appropriate CPAN namespace for perl parser
by jmcnamara (Monsignor) on Feb 13, 2002 at 10:50 UTC

    Several people have pointed out that "only perl can parse Perl".

    This is an epigram, a play on words, a joke. It isn't an axiom unless we treat it as such.

    Perltidy parses perl well enough for most peoples requirements. Also, Damian Conway has stated many times that he wants to parse Perl in Perl. As far as I've seen he doesn't get this type of reaction.

    Adam's goal of parsing 99.9% of Perl, or at least the Perl on CPAN, is worthy and possibly obtainable. He appears to be aware of the difficulties. As such we such lend support rather than prejudging it. A tool such as this would be extremely useful. The CPANTS project, for instance, is probably doomed without something like this.

    --
    John.

      Im glad you mentioned Perltidy. In many respects perltidy already contains a tokenizer that handles the majority of perl constructs, as well as mechanisms for identifying some of the more pain in the ass errors (such as a missing { ( [ Also there are tools such as B::Deparse that would probably help with the task of resolving (semi)ambiguous statements (otoh that might not be the best plan for security reasons... :-)

      Much respect to adamk for trying a hard task, but I would hope he at least has a look at the existing codebase and would prefer that he joins forces with those authors. At the very least it would make a very popular tool like Perltidy even better...

      my $.02;

      Yves / DeMerphq
      --
      When to use Prototypes?

Re: Appropriate CPAN namespace for perl parser
by gellyfish (Monsignor) on Feb 12, 2002 at 21:37 UTC

    Although you have had some sensible answers here you might also want to ask in modules@perl.org and also in the newsgroup comp.lang.perl.modules about this if you want to reach the widest consensus ;-}

    /J\

Re: Appropriate CPAN namespace for perl parser
by adamk (Chaplain) on Feb 12, 2002 at 23:40 UTC
    Let me just also add that my standard for accuracy is CPAN. If the module can successfully parse 99.9% of the code in CPAN, I consider it accurate enough, although improvements to accuracy are welcome.

    With that in mind I would love it if someone could suggest a way of getting hold of all of the Perl code in CPAN without having to install/compile etc...

    It's a big ask, but I'm not the sort to back down from a challenge.