>>>>> "Damian" == Damian Conway <damian@cs.monash.edu.au> writes:
Damian> Take a look at Text::Balanced -- in particular how the Damian> extract_quotelike,
extract_variable, and extract_codeblock Damian> subroutines conspire to do a fair imitation of parsing
Damian> Perl. It would be trivial* to adapt these three to compress Damian> Perl code by extracting
comments and non-quoted multiple Damian> whitespace.
Damian> * ...for sufficiently non-trivial values of "trivial"...
Even with your disclaimer, I think not.
As my repeatedly quoted example goes...
$n = time / 2 ; # / ; # first hash starts comment
$n = sin / 2 ; # / ; # second hash starts comment
You can't strip comments until you can recognize that time is proto () while sin is proto ($). And there
are seven other characters like "/" that have a "looking for term" vs "looking for operator" feature. And
here we go with the non-deterministic part:
use Faith;
$n = wazzat / 2 ; # / ; # what is wazzat prototyped as? :)
Yup. You gotta suck in everything that Faith.pm reads as well, thanks to user prototypes. And then
we get the odd case:
BEGIN { eval time % 2 ? 'sub wazzat ();' : 'sub wazzat ($);' }
+$n = wazzat / 2 ; # / ; # what is
wazzat prototyped as? :)
Yup. Non-deterministic parsing. :)
And while these are extreme examples, real life proggies get tripped up by the same things.
Thus, only "perl" can parse "Perl" continues to be true.
I would write a verse
of haiku to show the point,
but I'm not that good. :-)
print "Just another Perl hacker,"
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
training!