Re: RE - match from right
by davido (Cardinal) on Oct 06, 2004 at 15:43 UTC
|
Anchor to the right with the $ metacharacter. By anchoring to the right, and using a negative character class, you minimize the amount of work needed to find the last occurrence even if it's not the last thing in the string.
s/b([^b]*)$/x$1/
Update:
And the following should yield better performance by using a positive lookahead assertion (eliminating the need for capturing parens), and still anchoring to the right.
s/b(?=[^b]*$)/x/
| [reply] [d/l] [select] |
|
|
I was curious about whether the anchoring really reduced the work, so I ran a benchmark of the various suggested methods. I found some of the results surprising -- like the fact that xeger was not a lot slower than the anchored match. And anchoring the 2-step solution slowed it down.
@strings = ( 'There was a bright man from Nantucket'
,'Abba babbles about bubble blowers'
,'Enough with the excess bs already'
,'None at all in this one');
my @c;
use Benchmark 'cmpthese';
%methods = (
'anch' => sub { s/b(?=[^b]*$)/x/ for @c=@strings},
'xeger' => sub { $_ = reverse, s/b/x/, $_ = reverse for @c=@strings
+},
'plain' => sub { s/(.*)b/$1x/ for @c=@strings},
'2part' => sub { /.*(?=b)/g and s/\Gb/x/ for @c=@strings },
'2part-A' => sub { /b(?=[^b]*$)/g and s/\Gb/x/ for @c=@strings },
'substr' => sub { substr($_, rindex($_, 'b'), 1) = 'x' for @c=@strin
+gs},
'neglook' => sub { s/b(?!.*b)/x/ for @c=@strings }
);
cmpthese(-3, \%methods );
Results:
Rate plain 2part-A 2part neglook xeger anch substr
plain 3208/s -- -31% -35% -44% -47% -51% -64%
2part-A 4630/s 44% -- -6% -20% -23% -29% -48%
2part 4948/s 54% 7% -- -14% -18% -24% -45%
neglook 5767/s 80% 25% 17% -- -4% -12% -36%
xeger 6029/s 88% 30% 22% 5% -- -8% -33%
anch 6536/s 104% 41% 32% 13% 8% -- -27%
substr 8985/s 180% 94% 82% 56% 49% 37% --
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
Re: RE - match from right
by TheEnigma (Pilgrim) on Oct 06, 2004 at 15:41 UTC
|
Without reversing the string:
$string =~ s/(.*)b/\1x/;
I don't know if that would be faster or slower than reversing it first.
| [reply] [d/l] |
Re: RE - match from right
by davis (Vicar) on Oct 06, 2004 at 15:36 UTC
|
reverse() your string, then run the regex as if you want the first match, then reverse() again
davis
It wasn't easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
| [reply] |
Re: RE - match from right
by Roy Johnson (Monsignor) on Oct 06, 2004 at 15:41 UTC
|
That's a good description: b not followed by a string containing a b. You could also do it with "sexeger" (using a regex on the reverse of the string):
$string = reverse $string;
$string =~ s/b/x/;
$string = reverse $string;
though that seems a little excessive for this case. The other usual way to do it is:
$string =~ s/(.*)b/$1x/;
or (at least as clunky as your original):
$string =~ m/.*(?=b)/g and $string =~ s/\Gb/x/;
Finally, you can do it without using a regex at all:
substr($string, rindex($string, 'b'), 1) = 'x';
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
Re: RE - match from right
by PodMaster (Abbot) on Oct 06, 2004 at 15:46 UTC
|
You want to add the $ anchor in that pattern (=~ s/(b)(?!.*b)$/x/;
Another approach is to use a sexeger
my $string = "abababa";
$string = reverse $string;
$string =~ s/b/x/;
$string = reverse $string;
print $string,$/;
or if you're dealing exact strings, substr/rindex
my $string = "abababa";
my $ri = rindex( $string, 'b' );
substr( $string, $ri, 1) = 'x';
print $string,$/;
I'm not sure which is better (and that may vary from different perl versions, and/or size of string),
so whats left is to benchmark.
update: and here is my benchmark. sexeger comes out on top in two different versions of perl
UPDATE: And of course, I hadn't taken my own advice.
Here is the updated benchmark:
| MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | | ** The third rule of perl club is a statement of fact: pod is sexy. |
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
I wonder if it might be even faster without _capturing_ parentheses?
Changing
$string =~ s/(b)(?!.*b)$/x/;
to this
$string =~ s/b(?!.*b)$/x/;
| [reply] [d/l] |
|
|
Both of those regexps fail when 'b' isn't the last character in the string, because the negative lookahead assertion is zero-width, and yet you're anchoring to the RHS of the string.
Look at the following code and you'll see:
use strict;
use warnings;
my $orig_string = 'abababa';
my @tests = ( qr/(b)(?!.*b)$/ ,
qr/b(?!.*b)$/ ,
qr/b(?!.*b.*$)/ ,
qr/b(?=[^b]*$)/ );
foreach my $test_re ( @tests ) {
my $scratch = $orig_string;
print "$test_re\tdid",
$scratch =~s/$test_re/x/ ? ' ' : "n't " ,
"match '$orig_string', yielding '$scratch'\n";
}
The problem with your regexps is that they require 'b' to be the last thing in the string. This is because the lookahead assertion, positive or negative, is zero-width. If you move the anchor inside the assertion, it would improve, but it becomes kludgy, and IMO, not as clearly defined if you insist on using negative lookahead (see the third test regexp). In the fourth regexp, you can see the use of a positive lookahead and a negated character class, which, to me, is clearer, harder to break, etc.
| [reply] [d/l] |
|
|
When having a look at the benchmarks above, I notice that perl 5.6 is faster than perl 5.8 for that kind of operation. I remember having heard something about it by the way.
Is it more general ? Is perl 5.6 faster than perl 5.8 ? Do some of you hesitate to use perl 5.8 in real life when high performance is needed ?
Just curious.
| [reply] |
Re: RE - match from right
by borisz (Canon) on Oct 06, 2004 at 15:36 UTC
|
You can reverse the string.
$_ = reverse $string;
s/b/x/;
$string = reverse $_;
| [reply] [d/l] |
Re: RE - match from right
by TedPride (Priest) on Oct 07, 2004 at 08:46 UTC
|
Using rindex / substr is far better than messing around with regex. For instance:
my $string = "abababa";
my $from = 'b'; my $to = 'x';
print "was [$string]\n";
substr($string, rindex($string, $from), length($from)) = $to;
print "now [$string]\n";
$string, $from, and $to can be pretty much any length you want. | [reply] [d/l] |
Re: RE - match from right
by Anonymous Monk on Oct 07, 2004 at 10:48 UTC
|
It depends. I assume that your real string and real pattern are something else than 'abababa' and 'b'. Something that might not be easily reversable. What if your string is 'abbabba', and your pattern 'b+', to be replaced with 'x'. Should that result in 'abbaxa', or 'abbabxa'? Simply reversing the string and pattern, as suggested by some, will give you 'abbaxa', but the right most subpattern that matches 'b+' is the last b in the string. | [reply] |