Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to replace some metachars ? => # and * => ? but only if they are unescaped and at the same time unescape all escaped chars.
I first tried many variations with a couple of substitutions but none of them worked. At last I found something that works but looks somewhat odd:
my $string = 't?e\\\\xt\\\\* with escapes\\*'; my $result = ''; while (not $string =~ m/\G\z/gc) { if ($string =~ /\G([^\\*?]+)/gc) { # normal chars $result .= $1; } elsif ($string =~ /\G\\(.)/gc) { # escaped $result .= $1; } elsif ($string =~ /\G\*/gc) { # unescaped * $result .= '?'; } elsif ($string =~ /\G\?/gc) { # unescaped ? $result .= '#'; } } $string = $result; print $string; # prints 't#e\xt\? with escapes*'
As I said I can live with it but I wonder if there is also a solution based on a couple of s///g; It is somewhat embarrassing not to find a solution for such a seemingly simple problem. So I hope for your wisdom to learn and improve my regex arsenal.

Thanks
Michael

Replies are listed 'Best First'.
Re: Replace only unescaped metachars
by ikegami (Patriarch) on Feb 22, 2007 at 17:33 UTC

    You really do need a parser, so while a pair of substitutions is possible, it's not the most appropriate.

    my %conv = ( '*' => '?', '?' => '#' ); my $conv = quotemeta(join('', keys(%conv))); my $string = 't?e\\\\xt\\\\* with escapes\\*'; my $result = $string; for ($result) { s/(?<!\\)((?:\\{2})*)([$conv])/$1$conv{$2}/g; s/\\(.)/$1/sg; } print($result, "\n");

    Your code can be simplified (visually):

    my $string = 't?e\\\\xt\\\\* with escapes\\*'; my $result = ''; for ($string) { /\G \\(.) /xgcs && do { $result .= $1; redo; }; /\G \* /xgcs && do { $result .= '?'; redo; }; /\G \? /xgcs && do { $result .= '#'; redo; }; /\G (.) /xgcs && do { $result .= $1; redo; }; } print($result, "\n");

    An optimization of the above:

    my $string = 't?e\\\\xt\\\\* with escapes\\*'; my $result = ''; for ($string) { $result .= $1 if /\G ([^\\*?]+) /xgcs; /\G \\(.) /xgcs && do { $result .= $1; redo; }; /\G \* /xgcs && do { $result .= '?'; redo; }; /\G \? /xgcs && do { $result .= '#'; redo; }; } print($result, "\n");
      Ah, yes, this is the clever part I was missing:
      (?<!\\) ((?:\\{2})*)
      It makes sure that only unescaped metachars are converted (all preceding escapes are even).
      Thanks!

      But why is the following an optimization (against my code)?:
      for ($string) { $result .= $1 if /\G ([^\\*?]+) /xgcs; /\G \\(.) /xgcs && do { $result .= $1; redo; }; /\G \* /xgcs && do { $result .= '?'; redo; }; /\G \? /xgcs && do { $result .= '#'; redo; }; }
      If I see it right, both versions try to keep the common case at the beginning and have to walk through the other regexes until the first match. (Besides the 'redo' irritated me until I read in the docs that it doesn't redo do-blocks) But still thank you very much for this too, it is always good to have variations at hand.
      -Michael

        But why is the following an optimization (against my code)?:

        It's an optimization against *my* code. You had a similar optimization already. (Your first elsif would have to be an if to be the same.)

        Besides the 'redo' irritated me until I read in the docs that it doesn't redo do-blocks

        redo, last and next only work on the various for/foreach blocks, while blocks and bare blocks. They don't work on non-loop blocks such as if, do, eval and sub blocks.

Re: Replace only unescaped metachars
by ambrus (Abbot) on Feb 23, 2007 at 09:21 UTC

    So, applying the idea I linked to above, we get this.

    $_ = "t?e\\\\xt\\\\* with escapes\\*\n"; s/\\\\/\\s/g, s/(?<!\\)\?/#/g, s/(?<!\\)\*/?/g, s/\\(\W)/$1/g, s/\\s/\ +\/g; print;
    This outputs
    t#e\xt\? with escapes*

    Update: without lookarounds (this is what you'd do if all you had was sed):

    $_ = "t?e\\\\xt\\\\* with escapes\\*\n"; s/\\\\/\\s/g, s/\\\?/\\S/g, s/\\\*/\\T/g, s/\?/#/g, s/\*/?/g, s/\\(\W) +/$1/g, s/\\S/?/g, s/\\T/\*/g, s/\\s/\\/g; print;
      Oh, but I would consider this to be cheating ;-)
      I mean the temporary replacements. Your solution reduces the complexity instead of mastering it.
      Of course I agree that this often is a good strategy but I was curious if it is possible to handle the three tasks (unescape, conversion, different treatment of escaped and unescaped metachars) at the same time, only allowing a little pre- and/or postprocessing perhaps.

      But still, thanks for joining in!
      -Michael
Re: Replace only unescaped metachars
by Anno (Deacon) on Feb 22, 2007 at 17:33 UTC
    It can be done (at least approximated) in a somewhat more traditional s///g manner, but I'd use two substitutions to deal first with metacharacters, then unescaping.
    my $escape = qr/(?<!\\)\\/; my $meta_char = qr/(?<!$escape)[?*]/; no warnings 'qw'; my %meta = qw( ? # * ? ); my $result = my $string = 't?e\\\\xt\\\\* with e\\scapes\\*'; $result =~ s/($meta_char)/$meta{ $1}/g; $result =~ s/$escape(.)/$1/gs or die; print "$string\n"; print "$result\n";
    It transforms the example string in the same way as your code. I wouldn't swear that it does for all input strings.

    Anno

      Close, but no cigar
      my $string = '\\\\\\a'; # Unprocessed: \\\a # Should be: \a # Result: \\a
      my $string = '\\\\\\?'; # Unprocessed: \\\? # Should be: \? # Result: \\#

      See my top-level reply for a solution.

        Yes, I knew it had limitations which are hard to fix in an s/// approach.

        The recognition of escaped escapes is a naturally recursive problem. Regular expressions, even Perl's, are notoriously bad at that. Your remark about wanting a parser is spot on.

        Anno

Re: Replace only unescaped metachars
by Roy Johnson (Monsignor) on Feb 23, 2007 at 03:21 UTC
    A little different way does it with one s///, but some other helpers.
    my $str = 't?e\\\\xt\\\\* with escapes\\*'; my %swap; @swap{'?','*'} = ('#','?'); $str=~s/((?:\\)(.)|\?|\*)/$swap{$1}||$2/ge; print ">>$str<<\n";
    Ikegami points out that the (?:\\) can just be \\.

    Caution: Contents may have been coded under pressure.
      Oh, this one is clever _and_ readable. The /e might not be the most efficient but that is more than outweight by the clearness of the solution -- just one not too long regex!

      Thank you all for your help, I really learnt a lot with this little problem!

      Michael
        You might be thinking of eval EXPR (/ee) when you mentioned efficiency. /e doesn't cause any code to be compiled at run-time.
Re: Replace only unescaped metachars
by ambrus (Abbot) on Feb 23, 2007 at 08:23 UTC