Appropriate CPAN namespace for perl parser

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Appropriate CPAN namespace for perl parser by gmax (Abbot) on Feb 12, 2002 at 17:23 UTC
`[OT]` I feel sort of responsible for this node. :) This morning I got a message about this issue in module-authors@perl.org, and I decided to do some PerlMonks advocacy. I contacted you and I advised you to join our site and ask about your trouble. Apparently you did (Welcome!), even though the request is not as detailed as I had hoped. How about some more insight on your project? And besides, if you log in as yourself, the general public will appreciate it. `[/OT]` Now on topic at least. I think that your module should go under Parser::, maybe as Parser::Perl or Parse::Universal. But I would need more details to give you better advice. good luck. _ _ _ _ (_\|\| \| \|(_\|>< _\|	[reply] [d/l] [select]
Re: Appropriate CPAN namespace for perl parser by tachyon (Chancellor) on Feb 12, 2002 at 17:38 UTC
Wiser souls than I will counsel you that only perl can parse Perl. Once upon a time I embarked on a somewhat similar project. Here is some code that you may like to use in testing. Much of it came from insults and taunts from the on again off again monk Abigail print <<'drom"edary',<<"f#f",<<'=pod',<< ''; #comment Just another Perl Hacker drom"edary #foo f#f #bar =pod #foo #too #you $_ = "Just another Python# Hacker\n"; s {Python#} [Perl]; print; $_=<<'=pod'; Just another Perl Hacker =pod print; s[foo#] #comment [bar#]; #comment m{ \#[#] # comment \w+: # comment [#] # comment \# # comment \} # comment }x; # $_ = "Just another Perl Hacker\n"; { print if m\foo\#comment } $_ = "Just another Perl Hacker # No comment, no comment! # Yes, really! # I am really a Perl Hacker! ";print; (1) ? print "foo#\n" : print "bar#\n"; # foobar print "\"Goodnight I'm tired!\" # No comment here\n"; # but one here sleep 1; # Because tired print "Goodbye!\n"; BEGIN {if ($ARGV [0]) {eval 'sub foo () {print}'} else {eval 'sub foo ($) {print}'} } $_ = "Just another Perl Hacker\n"; foo /#/; 1; [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Appropriate CPAN namespace for perl parser by adamk (Chaplain) on Feb 12, 2002 at 23:22 UTC
I haven't listed it anywhere yet, but I anyone would like to have a play with a CGI frontend to the modules, you can at http://doug.idleplay.net/cgi-bin/CPAN/applications/psp/psp.pl As for the code above, it is mostly ok. The quote engine is written directly off the PODs on quote and quote-likes, and is quite accurate. The only one that gets it ( which is a fixable bug ) is print if m\foo\#comment The rest I feel should be ok. Try it for yourself. One quick note on that. There is data attached to the quotes as to where in the quote the actual data is, so comments are recognised as comments, but they will not appear that way in the syntax highlighting ( yet ). I have some requirements that Perl itself doesn't have, I can't just discard comments or Pod, so naturally the parsing process is a bit different.	[reply]
Re: Appropriate CPAN namespace for perl parser by jmcnamara (Monsignor) on Feb 12, 2002 at 17:14 UTC
`Parse::Perl::` might be appropriate. It would fit in with Parse::RecDescent and Parse::Yapp. It looks like an interesting project. Would you care to write a little bit about it. -- John.	[reply]
Re: Re: Appropriate CPAN namespace for perl parser by adamk (Chaplain) on Feb 12, 2002 at 22:44 UTC
I apologise for the lack of depth, as I don't have huge amounts of time on weekdays for posting. When I started this, I decided from the very beginning that I would not be able to write a "parser" capable to actually executing code. As I was told on more than one occasion, "that was lies madness". But executing code is not the only thing that get's done with code. Syntax highlighting, perltidy, source code analysis and a few other functions can all benefit from the ability to lex source properly, even if it can't execute it. The parser is written with that limit in mind. If you don't have to execute the code, you can be a bit more lax with your standards. As long as you parse the source code correctly in 99.9% of cases, you can get value from it. Second, to touch on the code structure, this is a char by char parser broken into two pieces. The first part Perl::Tokenizer:: goes over the code char by char and seperates the code into bits and returns a Perl::Document. The Perl::Lexer:: takes the document and builds the object tree, classifies blocks, and tags some extra information in about comments. This leaves you with a Perl::Lexer::Tree. From there, other modules, such as Perl::Transform::Tidy operate on trees. In the case of Tidy, it hunts down the tree and removes all whitespace, and then inserts replacement whitespace. Once the tree is flattenned back into a Perl::Document, and serialised into plain text again, the payout should come out the way you want it ( The Tidy code is currently not good enough by far ). Let me also note that this module is still considerred very experimental, although the syntax stuff works nicely. I expect to change module names, terminology, and possible method names before it is ready for CPAN. I'll try to attach further comments directly to the other comments.	[reply]
(Ovid) Re: Appropriate CPAN namespace for perl parser by Ovid (Cardinal) on Feb 12, 2002 at 18:00 UTC
You've written a parser, so it should go in the `Parse::` namespace. I believe that `Parse::Perl` is being worked on right now, but that does seem like an appropriate namespace. I have a couple of questions about your parser. Do you have tests for it? Do you have a distribution or documentation? I had a few things I wanted to throw at it which I was pretty sure would give me an idea of how robust it is, but all I could could find was a bunch of HTML representations of the code. It wouldn't be too hard to take those and strip out the line numbers, but it would be nice to have something to play with and give you feedback on. Incidentally, in your regex lib, I noticed that you appear to define a package as having the symbols separated by a double colon, when a single quote mark is also allowed (for Perl4 compatibility - see the D'oh module). You also have the Unix and Mac newlines reversed. Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply]
Re: (Ovid) Re: Appropriate CPAN namespace for perl parser by adamk (Chaplain) on Feb 12, 2002 at 23:34 UTC
Thanks, I'll look into it.	[reply]
Re: Appropriate CPAN namespace for perl parser by merlyn (Sage) on Feb 12, 2002 at 18:16 UTC
As I asked you in private email in reply to your module-author mailing, I'll ask again: How do you plan on handling the issues raised in my post On Parsing Perl? -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Appropriate CPAN namespace for perl parser by adamk (Chaplain) on Feb 12, 2002 at 23:32 UTC
Sorry about the lack of reply Randal... I have a nasty work email situation, so I should get to it shortly. But to answer your question, handling / is currently the most "fuzzy" part. It's the one character I had real problems with. Currently, it's written to handle the most common cases only. http://ali.as/PSP/source/Perl/Tokenizer/Classes.html line 144 in the browsable source code is the relavent section. Since I don't have exposure to the relavent sections of the perl C source, it was fairly difficult, but I'm sure there's a method to use that covers the 99.9% standard. With the difficulties in overcoming POD, __END__ etc tags, quote parsing, and the rest mostly solved, I wouldn't want to cancel the whole thing just because of a single character :)	[reply]
Re: Re: Re: Appropriate CPAN namespace for perl parser by merlyn (Sage) on Feb 13, 2002 at 06:33 UTC
But that's only one example. It's not just / (divide or regex). It's also dot (concatenate or decimal point), less-than (less than or filehandle read), two less-thans (left shift or here-doc), star (glob or multiply), percent (hash or modulus), ampersand (subroutine or bit-wise and), and question mark (regex or question-colon). If you aren't handling all of those, you aren't parsing Perl! Put another way, you cannot tokenize Perl without at all times knowing whether you are expecting a value or an operator, because all of the ones I just listed have double duty, depending on context. And yet, to know that, you also need to know if you have a prototyped function to the left that takes args or not. What a mess! -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re (tilly) 4: Appropriate CPAN namespace for perl parser by tilly (Archbishop) on Feb 13, 2002 at 06:46 UTC
Re: Re (tilly) 4: Appropriate CPAN namespace for perl parser by adamk (Chaplain) on Feb 13, 2002 at 15:05 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Re: Re: Appropriate CPAN namespace for perl parser by Anonymous Monk on Feb 13, 2002 at 09:20 UTC
But perl is not guessing! by merlyn (Sage) on Feb 13, 2002 at 13:04 UTC
Some notes below your chosen depth have not been shown here
Re: Appropriate CPAN namespace for perl parser by jmcnamara (Monsignor) on Feb 13, 2002 at 10:50 UTC
Several people have pointed out that "only perl can parse Perl". This is an epigram, a play on words, a joke. It isn't an axiom unless we treat it as such. Perltidy parses perl well enough for most peoples requirements. Also, Damian Conway has stated many times that he wants to parse Perl in Perl. As far as I've seen he doesn't get this type of reaction. Adam's goal of parsing 99.9% of Perl, or at least the Perl on CPAN, is worthy and possibly obtainable. He appears to be aware of the difficulties. As such we such lend support rather than prejudging it. A tool such as this would be extremely useful. The CPANTS project, for instance, is probably doomed without something like this. -- John.	[reply]
Re: Re: Appropriate CPAN namespace for perl parser by demerphq (Chancellor) on Feb 13, 2002 at 13:27 UTC
Im glad you mentioned Perltidy. In many respects perltidy already contains a tokenizer that handles the majority of perl constructs, as well as mechanisms for identifying some of the more pain in the ass errors (such as a missing `{ ( [` Also there are tools such as B::Deparse that would probably help with the task of resolving (semi)ambiguous statements (otoh that might not be the best plan for security reasons... :-) Much respect to adamk for trying a hard task, but I would hope he at least has a look at the existing codebase and would prefer that he joins forces with those authors. At the very least it would make a very popular tool like Perltidy even better... my $.02; Yves / DeMerphq -- When to use Prototypes?	[reply] [d/l]
Re: Appropriate CPAN namespace for perl parser by gellyfish (Monsignor) on Feb 12, 2002 at 21:37 UTC
Although you have had some sensible answers here you might also want to ask in modules@perl.org and also in the newsgroup comp.lang.perl.modules about this if you want to reach the widest consensus ;-} /J\	[reply]
Re: Appropriate CPAN namespace for perl parser by adamk (Chaplain) on Feb 12, 2002 at 23:40 UTC
Let me just also add that my standard for accuracy is CPAN. If the module can successfully parse 99.9% of the code in CPAN, I consider it accurate enough, although improvements to accuracy are welcome. With that in mind I would love it if someone could suggest a way of getting hold of all of the Perl code in CPAN without having to install/compile etc... It's a big ask, but I'm not the sort to back down from a challenge.	[reply]