in reply to vis-like re 'debug'

I've been using regular expressions on LZW compressed data recently

Interesting concept... Anyway, I guess if I was trying to do something like this, I'd just be sure to redirect the output of re 'debug' to a file, and inspect that when I want to with something other than a tty ("od", maybe, or some other hex-mode capable viewer/editor).

hv's idea is pretty cool, but ascii control codes (\x00 - \x1F, \x7F) are still control codes, even when they're "utf8" -- because all ascii code points are already utf8 anyway, by definition. (Perl's internal distinction between a "string of octets" and a "string of utf8" is just that: perl's internal distinction; once printed to some output, there is no difference for this range of code points.)

And \x{0080} - \x{009F} aren't properly displayable either... who knows what you'll see when you feed these to a console -- even one that's unicode savvy?

Replies are listed 'Best First'.
Re: Re: vis-like re 'debug'
by diotalevi (Canon) on May 09, 2003 at 13:46 UTC

    Its not as interesting as you think. 99% of the compressed data is skipped right over. I just have an odd database file format which interleaves record header information and then LZW compressed data. The only *actual* LZW thing I match is the two byte magic string "\037\235" and then the rest is about the header components. The thing is - re'debug' ends up passing the LZW data right through which is what screws things up.