Re: Re: Re: Appropriate CPAN namespace for perl parser
by merlyn (Sage) on Feb 13, 2002 at 06:33 UTC
|
But that's only one example. It's not just / (divide or regex). It's also dot (concatenate or decimal point), less-than (less than or filehandle read), two less-thans (left shift or
here-doc), star (glob or multiply), percent (hash or modulus), ampersand (subroutine or bit-wise and), and question mark (regex or question-colon).
If you aren't handling all of those, you aren't parsing Perl!
Put another way, you cannot tokenize Perl without at all times knowing whether
you are expecting a value or an operator, because all of the ones I just listed
have double duty, depending on context. And yet, to know that, you also need to know
if you have a prototyped function to the left that takes args or not. What a mess!
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] |
|
|
| [reply] |
|
|
I agree.
And this is my primary reason for thinking that a "guessy" perl parser would be enough. You eventually reach the point where you are loading a perl interpreter, firing up B:: to do a bunch of analysis, and then start a perl parsing process. I'm not convinced it can EVER be fully done.
And I'm starting to think it more as this thread goes on...
It also brings up questions like how to parse Win32:: code on a Unix box...? How could you parse any arbitrary piece of code without having everything it lists as a requirement installed, or at least on the machine
What if you are using some form of run-time module loading...
| [reply] |
|
|
|
|
But if I understand some of the docs correctly, even perl itself doesn't really know what everything is, it guesses based on heuristics etc "Do What I Think"... For example, in deciding what D'oh or s'e'f'g is ( The first evaluates as 'D::oh', the second being equivalent to $_ =~ s/e/f/g;
If Perl itself has to take educated guesses, can I allow myself the same luxury? As it currently stands, I takes guesses in certain situations which while not as accurate as Perl's, do the job in a percentage of cases, hopefully a large one.
As the module evolves, I would hope that the guesses get better and better. I personally believe that that is good enough.
And should the need arise, I'll merge the tokenizer and lexer into a single unit, add prototype checking and context tracking, or whatever else is required ( goddammit :) ), should they be required. I don't plan to be perfect. And given the number of man years spent on perl itself, it's probably a lost cause trying to get all the way to perfect. But that's no reason not to have something that provides value in other ways.
BTW, thanks for the SLUG visit, I certainly enjoyed it, if only for the 'use base' alone. ( I asked the icky symbol table question )
Adam
| [reply] |
|
|
But if I understand some of the docs correctly, even perl itself doesn't really know what everything is, it guesses based on heuristics
But it doesn't guess! It knows precisely. And for you to properly parse Perl, you must emulate those precisely.
There's very little statistical guessing in /usr/bin/perl. About the only
two things is the hash-element vs. scalar-followed-by-char-class thingy in a regex,
and the "is this a block or a hashref" in certain places that might have either.
Everything else is deterministic. Your code must do it right, or it's not parsing
Perl as perl would.
Put another way, I know that since sin takes an argument, that if I use
slash following it, it's a regex-start. It's never a divide. I can tell that
without running it or debugging it. And if I put a double-less-than, it's a
here doc. It's never a left shift. But if I replace sin with time, the exact opposite choices are taken.
You cannot guess. To parse Perl, you must know at all times whether you are
in a place expecting a value or a place expecting an operator. And to do that, you
have to know the prototype of all the built-ins, and how to get the prototype of all
the user-defined functions. Which also means you have to step along with the
code, executing all the BEGIN blocks, including those spelled u-s-e.
This is not a simple task. Larry admits it. Damian was going to spend the
better part of this year working on Parse::Perl as a YAS-funded project.
If you are taking it on, but not aware of the things I've posted in this thread, it's
a bit like saying "I can fly that plane", but just getting in, without realizing
there are clouds and bad weather and other planes, and that landing can be
a real pain sometimes, and what happens when the engines go out.
-- Randal L. Schwartz, Perl hacker
| [reply] |
|
|
|
|
|
|
|