in reply to RE: Wrapping long code tags
in thread Wrapping long code tags

After a bit of searching I found the post merlyn referred to:
>>>>> "Damian" == Damian Conway <damian@cs.monash.edu.au> writes:

Damian> Take a look at Text::Balanced -- in particular how the Damian> extract_quotelike, extract_variable, and extract_codeblock Damian> subroutines conspire to do a fair imitation of parsing Damian> Perl. It would be trivial* to adapt these three to compress Damian> Perl code by extracting comments and non-quoted multiple Damian> whitespace.

Damian> * ...for sufficiently non-trivial values of "trivial"...

Even with your disclaimer, I think not.

As my repeatedly quoted example goes...

$n = time / 2 ; # / ; # first hash starts comment $n = sin / 2 ; # / ; # second hash starts comment

You can't strip comments until you can recognize that time is proto () while sin is proto ($). And there are seven other characters like "/" that have a "looking for term" vs "looking for operator" feature. And here we go with the non-deterministic part:

use Faith; $n = wazzat / 2 ; # / ; # what is wazzat prototyped as? :)

Yup. You gotta suck in everything that Faith.pm reads as well, thanks to user prototypes. And then we get the odd case:

BEGIN { eval time % 2 ? 'sub wazzat ();' : 'sub wazzat ($);' } +$n = wazzat / 2 ; # / ; # what is wazzat prototyped as? :)

Yup. Non-deterministic parsing. :)

And while these are extreme examples, real life proggies get tripped up by the same things.

Thus, only "perl" can parse "Perl" continues to be true.

    I would write a verse
     of haiku to show the point,
     but I'm not that good. :-)
print "Just another Perl hacker,"
 -- 
 Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
 0095
 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
 Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
 See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
 training!


vroom | Tim Vroom | vroom@cs.hope.edu