After a bit of searching I found the post merlyn referred to:
>>>>> "Damian" == Damian Conway <damian@cs.monash.edu.au> writes:

Damian> Take a look at Text::Balanced -- in particular how the Damian> extract_quotelike, extract_variable, and extract_codeblock Damian> subroutines conspire to do a fair imitation of parsing Damian> Perl. It would be trivial* to adapt these three to compress Damian> Perl code by extracting comments and non-quoted multiple Damian> whitespace.

Damian> * ...for sufficiently non-trivial values of "trivial"...

Even with your disclaimer, I think not.

As my repeatedly quoted example goes...

$n = time / 2 ; # / ; # first hash starts comment $n = sin / 2 ; # / ; # second hash starts comment

You can't strip comments until you can recognize that time is proto () while sin is proto ($). And there are seven other characters like "/" that have a "looking for term" vs "looking for operator" feature. And here we go with the non-deterministic part:

use Faith; $n = wazzat / 2 ; # / ; # what is wazzat prototyped as? :)

Yup. You gotta suck in everything that Faith.pm reads as well, thanks to user prototypes. And then we get the odd case:

BEGIN { eval time % 2 ? 'sub wazzat ();' : 'sub wazzat ($);' } +$n = wazzat / 2 ; # / ; # what is wazzat prototyped as? :)

Yup. Non-deterministic parsing. :)

And while these are extreme examples, real life proggies get tripped up by the same things.

Thus, only "perl" can parse "Perl" continues to be true.

    I would write a verse
     of haiku to show the point,
     but I'm not that good. :-)
print "Just another Perl hacker,"
 -- 
 Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
 0095
 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
 Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
 See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
 training!


vroom | Tim Vroom | vroom@cs.hope.edu

In reply to RE: RE: Wrapping long code tags by vroom
in thread Wrapping long code tags by BBQ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.