Replace only unescaped metachars

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Replace only unescaped metachars by ikegami (Patriarch) on Feb 22, 2007 at 17:33 UTC
You really do need a parser, so while a pair of substitutions is possible, it's not the most appropriate. `my %conv = ( '' => '?', '?' => '#' ); my $conv = quotemeta(join('', keys(%conv))); my $string = 't?e\\\\xt\\\\ with escapes\\'; my $result = $string; for ($result) { s/(?<!\\)((?:\\{2}))([$conv])/$1$conv{$2}/g; s/\\(.)/$1/sg; } print($result, "\n");` [download] Your code can be simplified (visually): `my $string = 't?e\\\\xt\\\\* with escapes\\'; my $result = ''; for ($string) { /\G \\(.) /xgcs && do { $result .= $1; redo; }; /\G \ /xgcs && do { $result .= '?'; redo; }; /\G \? /xgcs && do { $result .= '#'; redo; }; /\G (.) /xgcs && do { $result .= $1; redo; }; } print($result, "\n");` [download] An optimization of the above: `my $string = 't?e\\\\xt\\\\* with escapes\\'; my $result = ''; for ($string) { $result .= $1 if /\G ([^\\?]+) /xgcs; /\G \\(.) /xgcs && do { $result .= $1; redo; }; /\G \* /xgcs && do { $result .= '?'; redo; }; /\G \? /xgcs && do { $result .= '#'; redo; }; } print($result, "\n");` [download]	[reply] [d/l] [select]
Re^2: Replace only unescaped metachars by Anonymous Monk on Feb 23, 2007 at 07:50 UTC
Ah, yes, this is the clever part I was missing: `(?<!\\) ((?:\\{2}))` [download] It makes sure that only unescaped metachars are converted (all preceding escapes are even). Thanks! But why is the following an optimization (against my code)?: `for ($string) { $result .= $1 if /\G ([^\\?]+) /xgcs; /\G \\(.) /xgcs && do { $result .= $1; redo; }; /\G \* /xgcs && do { $result .= '?'; redo; }; /\G \? /xgcs && do { $result .= '#'; redo; }; }` [download] If I see it right, both versions try to keep the common case at the beginning and have to walk through the other regexes until the first match. (Besides the 'redo' irritated me until I read in the docs that it doesn't redo do-blocks) But still thank you very much for this too, it is always good to have variations at hand. -Michael	[reply] [d/l] [select]
Re^3: Replace only unescaped metachars by ikegami (Patriarch) on Feb 23, 2007 at 17:42 UTC
But why is the following an optimization (against my code)?: It's an optimization against my code. You had a similar optimization already. (Your first `elsif` would have to be an `if` to be the same.) Besides the 'redo' irritated me until I read in the docs that it doesn't redo do-blocks `redo`, `last` and `next` only work on the various `for`/`foreach` blocks, `while` blocks and bare blocks. They don't work on non-loop blocks such as `if`, `do`, `eval` and `sub` blocks.	[reply] [d/l] [select]
Re: Replace only unescaped metachars by ambrus (Abbot) on Feb 23, 2007 at 09:21 UTC
So, applying the idea I linked to above, we get this. `$_ = "t?e\\\\xt\\\\* with escapes\\\n"; s/\\\\/\\s/g, s/(?<!\\)\?/#/g, s/(?<!\\)\/?/g, s/\\(\W)/$1/g, s/\\s/\ +\/g; print;` [download] This outputs `t#e\xt\? with escapes` [download] Update: without lookarounds (this is what you'd do if all you had was sed): `$_ = "t?e\\\\xt\\\\ with escapes\\\n"; s/\\\\/\\s/g, s/\\\?/\\S/g, s/\\\/\\T/g, s/\?/#/g, s/\/?/g, s/\\(\W) +/$1/g, s/\\S/?/g, s/\\T/\/g, s/\\s/\\/g; print;` [download]	[reply] [d/l] [select]
Re^2: Replace only unescaped metachars by Anonymous Monk on Feb 23, 2007 at 11:01 UTC
Oh, but I would consider this to be cheating ;-) I mean the temporary replacements. Your solution reduces the complexity instead of mastering it. Of course I agree that this often is a good strategy but I was curious if it is possible to handle the three tasks (unescape, conversion, different treatment of escaped and unescaped metachars) at the same time, only allowing a little pre- and/or postprocessing perhaps. But still, thanks for joining in! -Michael	[reply]
Re: Replace only unescaped metachars by Anno (Deacon) on Feb 22, 2007 at 17:33 UTC
It can be done (at least approximated) in a somewhat more traditional s///g manner, but I'd use two substitutions to deal first with metacharacters, then unescaping. `my $escape = qr/(?<!\\)\\/; my $meta_char = qr/(?<!$escape)[?]/; no warnings 'qw'; my %meta = qw( ? # ? ); my $result = my $string = 't?e\\\\xt\\\\* with e\\scapes\\*'; $result =~ s/($meta_char)/$meta{ $1}/g; $result =~ s/$escape(.)/$1/gs or die; print "$string\n"; print "$result\n";` [download] It transforms the example string in the same way as your code. I wouldn't swear that it does for all input strings. Anno	[reply] [d/l]
Re^2: Replace only unescaped metachars by ikegami (Patriarch) on Feb 22, 2007 at 17:53 UTC
Close, but no cigar `my $string = '\\\\\\a'; # Unprocessed: \\\a # Should be: \a # Result: \\a` [download] `my $string = '\\\\\\?'; # Unprocessed: \\\? # Should be: \? # Result: \\#` [download] See my top-level reply for a solution.	[reply] [d/l] [select]
Re^3: Replace only unescaped metachars by Anno (Deacon) on Feb 22, 2007 at 18:17 UTC
Yes, I knew it had limitations which are hard to fix in an `s///` approach. The recognition of escaped escapes is a naturally recursive problem. Regular expressions, even Perl's, are notoriously bad at that. Your remark about wanting a parser is spot on. Anno	[reply] [d/l]
Re^4: Replace only unescaped metachars by ikegami (Patriarch) on Feb 22, 2007 at 18:53 UTC
Re^5: Replace only unescaped metachars by Anno (Deacon) on Feb 22, 2007 at 19:16 UTC
Some notes below your chosen depth have not been shown here
Re: Replace only unescaped metachars by Roy Johnson (Monsignor) on Feb 23, 2007 at 03:21 UTC
A little different way does it with one s///, but some other helpers. `my $str = 't?e\\\\xt\\\\* with escapes\\'; my %swap; @swap{'?',''} = ('#','?'); $str=~s/((?:\\)(.)\|\?\|\)/$swap{$1}\|\|$2/ge; print ">>$str<<\n";` [download] Ikegami points out that the `(?:\\)` can just be `\\`. Caution:* Contents may have been coded under pressure.	[reply] [d/l] [select]
Re^2: Replace only unescaped metachars by Anonymous Monk on Feb 23, 2007 at 08:02 UTC
Oh, this one is clever _and_ readable. The /e might not be the most efficient but that is more than outweight by the clearness of the solution -- just one not too long regex! Thank you all for your help, I really learnt a lot with this little problem! Michael	[reply]
Re^3: Replace only unescaped metachars by ikegami (Patriarch) on Feb 23, 2007 at 17:47 UTC
You might be thinking of `eval EXPR` (`/ee`) when you mentioned efficiency. `/e` doesn't cause any code to be compiled at run-time.	[reply] [d/l] [select]
Re: Replace only unescaped metachars by ambrus (Abbot) on Feb 23, 2007 at 08:23 UTC
You may want to read checking interpreted string for escapes versus literal backslashes?.	[reply]