RE - match from right

bangers has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: RE - match from right
by davido (Cardinal) on Oct 06, 2004 at 15:43 UTC

Anchor to the right with the $ metacharacter. By anchoring to the right, and using a negative character class, you minimize the amount of work needed to find the last occurrence even if it's not the last thing in the string.

s/b([^b]*)$/x$1/

Update:
And the following should yield better performance by using a positive lookahead assertion (eliminating the need for capturing parens), and still anchoring to the right.

s/b(?=[^b]*$)/x/
[download]

Dave

[reply]
[d/l]
[select]

Re^2: RE - match from right

by Roy Johnson (Monsignor) on Oct 07, 2004 at 14:51 UTC

@strings = ( 'There was a bright man from Nantucket'
        ,'Abba babbles about bubble blowers'
        ,'Enough with the excess bs already'
        ,'None at all in this one');
my @c;

use Benchmark 'cmpthese';

%methods = (
  'anch'  => sub { s/b(?=[^b]*$)/x/ for @c=@strings},
  'xeger' => sub { $_ = reverse, s/b/x/, $_ = reverse for @c=@strings 
+},
  'plain' => sub { s/(.*)b/$1x/ for @c=@strings},
  '2part' => sub { /.*(?=b)/g and s/\Gb/x/ for @c=@strings },
  '2part-A' => sub { /b(?=[^b]*$)/g and s/\Gb/x/ for @c=@strings },
  'substr' => sub { substr($_, rindex($_, 'b'), 1) = 'x' for @c=@strin
+gs},
  'neglook' => sub { s/b(?!.*b)/x/ for @c=@strings }
);

cmpthese(-3, \%methods );
[download]

          Rate   plain 2part-A   2part neglook   xeger    anch  substr
plain   3208/s      --    -31%    -35%    -44%    -47%    -51%    -64%
2part-A 4630/s     44%      --     -6%    -20%    -23%    -29%    -48%
2part   4948/s     54%      7%      --    -14%    -18%    -24%    -45%
neglook 5767/s     80%     25%     17%      --     -4%    -12%    -36%
xeger   6029/s     88%     30%     22%      5%      --     -8%    -33%
anch    6536/s    104%     41%     32%     13%      8%      --    -27%
substr  8985/s    180%     94%     82%     56%     49%     37%      --
[download]

Caution: Contents may have been coded under pressure.

[reply]
[d/l]
[select]

Re: RE - match from right
by TheEnigma (Pilgrim) on Oct 06, 2004 at 15:41 UTC

$string =~ s/(.*)b/\1x/;
[download]

I don't know if that would be faster or slower than reversing it first.

TheEnigma

[reply]
[d/l]

Re: RE - match from right
by davis (Vicar) on Oct 06, 2004 at 15:36 UTC

reverse() your string, then run the regex as if you want the first match, then reverse() again

davis

It wasn't easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.

[reply]

Re: RE - match from right
by Roy Johnson (Monsignor) on Oct 06, 2004 at 15:41 UTC

$string = reverse $string;
$string =~ s/b/x/;
$string = reverse $string;
[download]

$string =~ s/(.*)b/$1x/;
[download]

$string =~ m/.*(?=b)/g and $string =~ s/\Gb/x/;
[download]

substr($string, rindex($string, 'b'), 1) = 'x';
[download]

Caution: Contents may have been coded under pressure.

[reply]
[d/l]
[select]

Re: RE - match from right
by PodMaster (Abbot) on Oct 06, 2004 at 15:46 UTC

=~ s/(b)(?!.*b)$/x/;

sexeger

my $string = "abababa";
$string = reverse $string;
$string =~ s/b/x/;
$string = reverse $string;
print $string,$/;
[download]

my $string = "abababa";
my $ri  = rindex( $string, 'b' );
substr( $string, $ri, 1) =  'x';
print $string,$/;
[download]

update: and here is my benchmark. sexeger comes out on top in two different versions of perl

Read more... (2 kB)

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]
[select]

Re^2: RE - match from right

by davido (Cardinal) on Oct 06, 2004 at 18:25 UTC

neganch => sub { my $string = "abababa"; $string =~ s/(b)(?!.*b)$/x/; return; }
[download]

That neganch test will fail if 'b' isn't the last character in the string, which it isn't.

Dave

[reply]
[d/l]
[select]

Re^2: RE - match from right

by shenme (Priest) on Oct 06, 2004 at 16:41 UTC

Changing 
    $string =~ s/(b)(?!.*b)$/x/;
to this 
    $string =~ s/b(?!.*b)$/x/;
[download]

[reply]
[d/l]

Re^3: RE - match from right

by davido (Cardinal) on Oct 06, 2004 at 18:17 UTC

Both of those regexps fail when 'b' isn't the last character in the string, because the negative lookahead assertion is zero-width, and yet you're anchoring to the RHS of the string.

Look at the following code and you'll see:

use strict;
use warnings;

my $orig_string = 'abababa';

my @tests = ( qr/(b)(?!.*b)$/ , 
              qr/b(?!.*b)$/   ,
              qr/b(?!.*b.*$)/ ,
              qr/b(?=[^b]*$)/   );

foreach my $test_re ( @tests ) {
    my $scratch = $orig_string;
    print "$test_re\tdid",
          $scratch =~s/$test_re/x/ ? '    ' : "n't " ,
          "match '$orig_string', yielding '$scratch'\n";
}
[download]

The problem with your regexps is that they require 'b' to be the last thing in the string. This is because the lookahead assertion, positive or negative, is zero-width. If you move the anchor inside the assertion, it would improve, but it becomes kludgy, and IMO, not as clearly defined if you insist on using negative lookahead (see the third test regexp). In the fourth regexp, you can see the use of a positive lookahead and a negated character class, which, to me, is clearer, harder to break, etc.

Dave

[reply]
[d/l]

[OT] : perl 5.8 compared to perl 5.6

by zejames (Hermit) on Oct 08, 2004 at 12:09 UTC

--
zejames

[reply]

Re: RE - match from right
by borisz (Canon) on Oct 06, 2004 at 15:36 UTC

$_ = reverse $string;
s/b/x/;
$string = reverse $_;
[download]

Boris

[reply]
[d/l]

Re: RE - match from right
by TedPride (Priest) on Oct 07, 2004 at 08:46 UTC

my $string = "abababa";
my $from = 'b'; my $to = 'x';
print "was [$string]\n";
substr($string, rindex($string, $from), length($from)) = $to;
print "now [$string]\n";
[download]

[reply]
[d/l]

Re: RE - match from right
by Anonymous Monk on Oct 07, 2004 at 10:48 UTC

It depends. I assume that your real string and real pattern are something else than 'abababa' and 'b'. Something that might not be easily reversable. What if your string is 'abbabba', and your pattern 'b+', to be replaced with 'x'. Should that result in 'abbaxa', or 'abbabxa'? Simply reversing the string and pattern, as suggested by some, will give you 'abbaxa', but the right most subpattern that matches 'b+' is the last b in the string.

[reply]