Re: Replace only unescaped metachars
by ikegami (Patriarch) on Feb 22, 2007 at 17:33 UTC
|
You really do need a parser, so while a pair of substitutions is possible, it's not the most appropriate.
my %conv = ( '*' => '?', '?' => '#' );
my $conv = quotemeta(join('', keys(%conv)));
my $string = 't?e\\\\xt\\\\* with escapes\\*';
my $result = $string;
for ($result) {
s/(?<!\\)((?:\\{2})*)([$conv])/$1$conv{$2}/g;
s/\\(.)/$1/sg;
}
print($result, "\n");
Your code can be simplified (visually):
my $string = 't?e\\\\xt\\\\* with escapes\\*';
my $result = '';
for ($string) {
/\G \\(.) /xgcs && do { $result .= $1; redo; };
/\G \* /xgcs && do { $result .= '?'; redo; };
/\G \? /xgcs && do { $result .= '#'; redo; };
/\G (.) /xgcs && do { $result .= $1; redo; };
}
print($result, "\n");
An optimization of the above:
my $string = 't?e\\\\xt\\\\* with escapes\\*';
my $result = '';
for ($string) {
$result .= $1 if /\G ([^\\*?]+) /xgcs;
/\G \\(.) /xgcs && do { $result .= $1; redo; };
/\G \* /xgcs && do { $result .= '?'; redo; };
/\G \? /xgcs && do { $result .= '#'; redo; };
}
print($result, "\n");
| [reply] [d/l] [select] |
|
|
Ah, yes, this is the clever part I was missing:
(?<!\\)
((?:\\{2})*)
It makes sure that only unescaped metachars are converted
(all preceding escapes are even).
Thanks!
But why is the following an optimization (against my code)?:
for ($string) {
$result .= $1 if /\G ([^\\*?]+) /xgcs;
/\G \\(.) /xgcs && do { $result .= $1; redo; };
/\G \* /xgcs && do { $result .= '?'; redo; };
/\G \? /xgcs && do { $result .= '#'; redo; };
}
If I see it right, both versions try to keep the common
case at the beginning and have to walk through the other
regexes until the first match. (Besides the 'redo' irritated
me until I read in the docs that it doesn't redo do-blocks)
But still thank you very much for this too, it is always
good to have variations at hand.
-Michael
| [reply] [d/l] [select] |
|
|
But why is the following an optimization (against my code)?:
It's an optimization against *my* code. You had a similar optimization already. (Your first elsif would have to be an if to be the same.)
Besides the 'redo' irritated me until I read in the docs that it doesn't redo do-blocks
redo, last and next only work on the various for/foreach blocks, while blocks and bare blocks. They don't work on non-loop blocks such as if, do, eval and sub blocks.
| [reply] [d/l] [select] |
Re: Replace only unescaped metachars
by ambrus (Abbot) on Feb 23, 2007 at 09:21 UTC
|
So, applying the idea I linked to above, we get this.
$_ = "t?e\\\\xt\\\\* with escapes\\*\n";
s/\\\\/\\s/g, s/(?<!\\)\?/#/g, s/(?<!\\)\*/?/g, s/\\(\W)/$1/g, s/\\s/\
+\/g;
print;
This outputs
t#e\xt\? with escapes*
Update: without lookarounds (this is what you'd do if all you had was sed):
$_ = "t?e\\\\xt\\\\* with escapes\\*\n";
s/\\\\/\\s/g, s/\\\?/\\S/g, s/\\\*/\\T/g, s/\?/#/g, s/\*/?/g, s/\\(\W)
+/$1/g, s/\\S/?/g, s/\\T/\*/g, s/\\s/\\/g;
print;
| [reply] [d/l] [select] |
|
|
Oh, but I would consider this to be cheating ;-)
I mean the temporary replacements. Your solution reduces the complexity instead of mastering it.
Of course I agree that this often is a good strategy but I was curious if it is possible to handle the three tasks (unescape, conversion, different treatment of escaped and unescaped metachars) at the same time, only allowing a little pre- and/or postprocessing perhaps.
But still, thanks for joining in!
-Michael
| [reply] |
Re: Replace only unescaped metachars
by Anno (Deacon) on Feb 22, 2007 at 17:33 UTC
|
It can be done (at least approximated) in a somewhat more traditional s///g manner, but I'd use two substitutions to deal first with metacharacters, then unescaping.
my $escape = qr/(?<!\\)\\/;
my $meta_char = qr/(?<!$escape)[?*]/;
no warnings 'qw';
my %meta = qw(
? #
* ?
);
my $result = my $string = 't?e\\\\xt\\\\* with e\\scapes\\*';
$result =~ s/($meta_char)/$meta{ $1}/g;
$result =~ s/$escape(.)/$1/gs or die;
print "$string\n";
print "$result\n";
It transforms the example string in the same way as your code. I wouldn't swear that it does for all input strings.
Anno | [reply] [d/l] |
|
|
my $string = '\\\\\\a';
# Unprocessed: \\\a
# Should be: \a
# Result: \\a
my $string = '\\\\\\?';
# Unprocessed: \\\?
# Should be: \?
# Result: \\#
See my top-level reply for a solution.
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
|
|
|
Re: Replace only unescaped metachars
by Roy Johnson (Monsignor) on Feb 23, 2007 at 03:21 UTC
|
A little different way does it with one s///, but some other helpers.
my $str = 't?e\\\\xt\\\\* with escapes\\*';
my %swap;
@swap{'?','*'} = ('#','?');
$str=~s/((?:\\)(.)|\?|\*)/$swap{$1}||$2/ge;
print ">>$str<<\n";
Ikegami points out that the (?:\\) can just be \\.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
|
|
Oh, this one is clever _and_ readable. The /e might not be the most efficient but that is more than outweight by the
clearness of the solution -- just one not too long regex!
Thank you all for your help, I really learnt a lot with
this little problem!
Michael
| [reply] |
|
|
You might be thinking of eval EXPR (/ee) when you mentioned efficiency. /e doesn't cause any code to be compiled at run-time.
| [reply] [d/l] [select] |
Re: Replace only unescaped metachars
by ambrus (Abbot) on Feb 23, 2007 at 08:23 UTC
|
| [reply] |