Sorry about the lack of reply Randal... I have a nasty work email situation, so I should get to it shortly. But to answer your question, handling / is currently the most "fuzzy" part. It's the one character I had real problems with.
Currently, it's written to handle the most common cases only.
http://ali.as/PSP/source/Perl/Tokenizer/Classes.html line 144 in the browsable source code is the relavent section.
Since I don't have exposure to the relavent sections of the perl C source, it was fairly difficult, but I'm sure there's a method to use that covers the 99.9% standard.
With the difficulties in overcoming POD, __END__ etc tags, quote parsing, and the rest mostly solved, I wouldn't want to cancel the whole thing just because of a single character :) | [reply] |
But that's only one example. It's not just / (divide or regex). It's also dot (concatenate or decimal point), less-than (less than or filehandle read), two less-thans (left shift or
here-doc), star (glob or multiply), percent (hash or modulus), ampersand (subroutine or bit-wise and), and question mark (regex or question-colon).
If you aren't handling all of those, you aren't parsing Perl!
Put another way, you cannot tokenize Perl without at all times knowing whether
you are expecting a value or an operator, because all of the ones I just listed
have double duty, depending on context. And yet, to know that, you also need to know
if you have a prototyped function to the left that takes args or not. What a mess!
-- Randal L. Schwartz, Perl hacker
| [reply] [d/l] |
| [reply] |
But if I understand some of the docs correctly, even perl itself doesn't really know what everything is, it guesses based on heuristics etc "Do What I Think"... For example, in deciding what D'oh or s'e'f'g is ( The first evaluates as 'D::oh', the second being equivalent to $_ =~ s/e/f/g;
If Perl itself has to take educated guesses, can I allow myself the same luxury? As it currently stands, I takes guesses in certain situations which while not as accurate as Perl's, do the job in a percentage of cases, hopefully a large one.
As the module evolves, I would hope that the guesses get better and better. I personally believe that that is good enough.
And should the need arise, I'll merge the tokenizer and lexer into a single unit, add prototype checking and context tracking, or whatever else is required ( goddammit :) ), should they be required. I don't plan to be perfect. And given the number of man years spent on perl itself, it's probably a lost cause trying to get all the way to perfect. But that's no reason not to have something that provides value in other ways.
BTW, thanks for the SLUG visit, I certainly enjoyed it, if only for the 'use base' alone. ( I asked the icky symbol table question )
Adam
| [reply] |