Re: Re: Re: Re: Appropriate CPAN namespace for perl parser

But if I understand some of the docs correctly, even perl itself doesn't really know what everything is, it guesses based on heuristics etc "Do What I Think"... For example, in deciding what D'oh or s'e'f'g is ( The first evaluates as 'D::oh', the second being equivalent to $_ =~ s/e/f/g;

If Perl itself has to take educated guesses, can I allow myself the same luxury? As it currently stands, I takes guesses in certain situations which while not as accurate as Perl's, do the job in a percentage of cases, hopefully a large one.

As the module evolves, I would hope that the guesses get better and better. I personally believe that that is good enough.

And should the need arise, I'll merge the tokenizer and lexer into a single unit, add prototype checking and context tracking, or whatever else is required ( goddammit :) ), should they be required. I don't plan to be perfect. And given the number of man years spent on perl itself, it's probably a lost cause trying to get all the way to perfect. But that's no reason not to have something that provides value in other ways.

BTW, thanks for the SLUG visit, I certainly enjoyed it, if only for the 'use base' alone. ( I asked the icky symbol table question )

Adam

Comment on Re: Re: Re: Re: Appropriate CPAN namespace for perl parser

Replies are listed 'Best First'.
But perl is not guessing! by merlyn (Sage) on Feb 13, 2002 at 13:04 UTC
But if I understand some of the docs correctly, even perl itself doesn't really know what everything is, it guesses based on heuristics But it doesn't guess! It knows precisely. And for you to properly parse Perl, you must emulate those precisely. There's very little statistical guessing in `/usr/bin/perl`. About the only two things is the hash-element vs. scalar-followed-by-char-class thingy in a regex, and the "is this a block or a hashref" in certain places that might have either. Everything else is deterministic. Your code must do it right, or it's not parsing Perl as `perl` would. Put another way, I know that since `sin` takes an argument, that if I use slash following it, it's a regex-start. It's never a divide. I can tell that without running it or debugging it. And if I put a double-less-than, it's a here doc. It's never a left shift. But if I replace `sin` with `time`, the exact opposite choices are taken. You cannot guess. To parse Perl, you must know at all times whether you are in a place expecting a value or a place expecting an operator. And to do that, you have to know the prototype of all the built-ins, and how to get the prototype of all the user-defined functions. Which also means you have to step along with the code, executing all the `BEGIN` blocks, including those spelled `u`-`s`-`e`. This is not a simple task. Larry admits it. Damian was going to spend the better part of this year working on `Parse::Perl` as a YAS-funded project. If you are taking it on, but not aware of the things I've posted in this thread, it's a bit like saying "I can fly that plane", but just getting in, without realizing there are clouds and bad weather and other planes, and that landing can be a real pain sometimes, and what happens when the engines go out. -- Randal L. Schwartz, Perl hacker	[reply]
Re: But perl is not guessing! by adamk (Chaplain) on Feb 13, 2002 at 13:41 UTC
Fair enough, you know Perl's behaviour a lot better than I do. I can see ways of modifying the code towards, and possibly all the way to the behaviour that you would like to see, although I wouldn't like to do so at this point, as I have the level of functionality out of the current codebase that I wanted as a minimum ( excluding bugs ), not to mention that it would slow things down a lot, since I'm not going to write any C into the back end. But let me add this. Even if the source isn't parsed "properly" in some cases, it may not need to be depending on what you want to do with it. I have no illusions of taking perl source code and executing it ( "flying a plane" ). But as I have pointed out, there are other uses for perl source that may not need the same level of rigour. Allow me to add some questions to this. Does the perl interpreter record all whitespace and comments, the content after the __END__ signal, and other stuff in it's model of the code? All this extra padding is required so that once you have a complete model of the source, you can serialize it all the way back down to text, and come out with what you started with? I would like to be able to have a piece of code walk into the code tree, modify the code within, and then be able to write that code back out to a file. Is it EVER possible to write a proper and full perl parser without using perl itself... The B modules demonstrate this. Even if you could get a proper parser, what about things like CGI.pm? Can you tell what methods the CGI module will have? At some point, you either have to draw the line, or you end up with perl itself. I choose to draw the line at reading MOST source code. My stardards will probably go up over time, as code is added around the edges that NEED the extra depth. Finally, two last questions. 1. What were you planning on doing with a perl parser. 2. How slow are you prepared for it to be?	[reply]
Re: Re: But perl is not guessing! by merlyn (Sage) on Feb 13, 2002 at 13:46 UTC
Just addressing this one point: Is it EVER possible to write a proper and full perl parser without using perl itself... The B modules demonstrate this. The B modules are using "perl itself" to do the parsing. That's why they can get it right. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: But perl is not guessing! by adamk (Chaplain) on Feb 13, 2002 at 13:48 UTC
Re: Re: Re: Re: But perl is not guessing! by demerphq (Chancellor) on Feb 14, 2002 at 16:39 UTC
Re: But perl is not guessing! by Smylers (Pilgrim) on Feb 13, 2002 at 13:39 UTC
Maybe a name such as `Parse::Perl::Guessy` or `Parse::Perl::Naive` or `Parse::Perl::Approx` or `Parse::Perl::Ish` or something would be appropriate for your module Adam? Randal is obviously right that you aren't really parsing Perl without getting everything right in a deterministic fashion. I don't think it'd be good to use up the Parse::Perl name for something that isn't doing this (especially if Damian is on the way to writing something that does). But I'd certainly like to see a module that makes a decent job of parsing much Perl, for things like syntax highlighting, even if the scope is limited and it can be tricked. For example, I always put spaces around arithmetic operators. I'd be happy enough for a parser to assume that a “ `/` ” is always divide, and know that if I ever want a space at the start of a regexp I should use “`m/` ” or “`/\s`” or something. Smylers	[reply] [d/l] [select]