diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

I've been using regular expressions on LZW compressed data recently and discovered that while use re 'debug' is insanely useful, it doesn't play well with binary data. Is there way to do something vis-like to the output? The following sample code produces some very unfriendly output for my PuTTY console. I found myself having to use reset(1) frequently to "fix" my console and occasionally had to just close it and get a new one.

use re 'debug'; $_ = join '', map chr, 0 .. 31, 128 .. 255; 1 while m{.}sg;

Replies are listed 'Best First'.
Re: vis-like re 'debug'
by graff (Chancellor) on May 09, 2003 at 03:22 UTC
    I've been using regular expressions on LZW compressed data recently

    Interesting concept... Anyway, I guess if I was trying to do something like this, I'd just be sure to redirect the output of re 'debug' to a file, and inspect that when I want to with something other than a tty ("od", maybe, or some other hex-mode capable viewer/editor).

    hv's idea is pretty cool, but ascii control codes (\x00 - \x1F, \x7F) are still control codes, even when they're "utf8" -- because all ascii code points are already utf8 anyway, by definition. (Perl's internal distinction between a "string of octets" and a "string of utf8" is just that: perl's internal distinction; once printed to some output, there is no difference for this range of code points.)

    And \x{0080} - \x{009F} aren't properly displayable either... who knows what you'll see when you feed these to a console -- even one that's unicode savvy?

      Its not as interesting as you think. 99% of the compressed data is skipped right over. I just have an odd database file format which interleaves record header information and then LZW compressed data. The only *actual* LZW thing I match is the two byte magic string "\037\235" and then the rest is about the header components. The thing is - re'debug' ends up passing the LZW data right through which is what screws things up.

Re: vis-like re 'debug'
by hv (Prior) on May 09, 2003 at 02:59 UTC

    Hmm, there has been some effort in recent times put into cleaning up re 'debug' output, but (checking) it appears that this has been added specifically for UTF8 strings; indeed, if you have perl-5.8.0, you should find that forcing the string to be UTF8-encoded will give you the output you're looking for:

    use re 'debug'; $_ = join '', map chr, 0 .. 31, 128 .. 255; chop($_ .= chr(256)); # same string, but now upgraded to UTF8 1 while m{.}sg;

    Certainly I agree that the debug output should be sanitised for such characters and I feel it should be possible to fix that for 5.10.0, though I don't know if it is likely to make it into 5.8.1.

    Hugo