Hi all, another question related to innotop. Innotop displays currently running queries in MySQL to a terminal, and one user wrote me with troubles caused by gzipped data in queries:

update tbl set col='@@@@', col="@@@@" ...

The @@@@ is gzipped cuss-words that freak his terminal out. The way we'd like this to be handled is to simply replace the field's contents with three question marks:

update tbl set col='???', col="???" ...

Now for the fun part: fields can be delimited with either single or double quotes. If single, doubles can be included and vice versa. If single quotes are used, more single quotes can be embedded, preceded by backslashes. And finally, no substitution should happen unless the field has non-printable characters in it.

A friend and I tried for a while to write a regex to accomplish this, but it gets really confusing. I have a nagging feeling this is a solved problem, but I can't turn it up in web searches.

Here's where we've gotten so far:

$text =~ s/ (?<!\\)' # A quote, NOT after a backslash (?:[^'\p{IsC}]*?(?:\\')?)*? # This is hard to explain... \p{IsC} # At least one non-printable char (?:[^']*?(?:\\')?)*? # More stuff... (?<!\\)' # Another single quote /'???'/gx;

This sorta works for single-quoted fields only. But I have a feeling this is entirely the wrong approach, and I'd like to do it in one pass, not twice (one for single quotes, one for double). Among other reasons, if there's a single-quoted string embedded within double quotes or vice versa, one pass that handles both cases is much preferable.

Where's the a-ha moment? Or is there one?

Thanks in advance!


In reply to How to replace non-printable characters within delimited field by xaprb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.