comment on

Perl is doing much more (but smarter) work when it finds a R in the string:

use strict;
use warnings;
use Benchmark qw/ cmpthese /;

my $R = 'R' x 10;
my $Q = 'Q' x 10;

use re 'debug';

sub foo {
    my $c = shift;

    pos $$c = 0;
    while ( 1 ) {
        next if $$c =~ /\G\d+R/;
        last unless $$c =~ /\w/g;
    }
}

print "-" x 80, "\n";
foo( \$Q );
print "-" x 80, "\n";
foo( \$R );
print "-" x 80, "\n";
[download]

Matching REx "\G\d+R" against "QQQQQQQQQQ"
Intuit: trying to determine minimum start position...
  Did not find floating substr "R"...
Match rejected by optimizer
Matching REx "\w" against "QQQQQQQQQQ"
Matching stclass POSIXD[\w] against "QQQQQQQQQQ" (10 bytes)
   0 <> <QQQQQQQQQQ>         |  1:POSIXD[\w](2)
   1 <Q> <QQQQQQQQQ>         |  2:END(0)
Match successful!
...
Matching REx "\G\d+R" against "QQQQQQQQQQ"
Regex match can't succeed, so not even tried
Matching REx "\w" against "Q"
Matching stclass POSIXD[\w] against "Q" (1 bytes)
   9 <QQQQQQQQQ> <Q>         |  1:POSIXD[\w](2)
  10 <QQQQQQQQQQ> <>         |  2:END(0)
Match successful!
Matching REx "\G\d+R" against "QQQQQQQQQQ"
Regex match can't succeed, so not even tried
Matching REx "\w" against ""
Regex match can't succeed, so not even tried
----------------------------------------------------
----------------------------------------------------------------------
+----------

Matching REx "\G\d+R" against "RRRRRRRRRR"
Intuit: trying to determine minimum start position...
  Found floating substr "R" at offset 1...
  (multiline anchor test skipped)
  looking for class: start_shift: 1 check_at: 1 rx_origin: 0 endpos: 1
  This position contradicts STCLASS...
Match rejected by optimizer
Matching REx "\w" against "RRRRRRRRRR"
Matching stclass POSIXD[\w] against "RRRRRRRRRR" (10 bytes)
   0 <> <RRRRRRRRRR>         |  1:POSIXD[\w](2)
   1 <R> <RRRRRRRRR>         |  2:END(0)
Match successful!
Matching REx "\G\d+R" against "RRRRRRRRRR"
Intuit: trying to determine minimum start position...
  Found floating substr "R" at offset 1...
  (multiline anchor test skipped)
  looking for class: start_shift: 1 check_at: 2 rx_origin: 1 endpos: 2
  This position contradicts STCLASS...
Match rejected by optimizer
Matching REx "\w" against "RRRRRRRRR"
Matching stclass POSIXD[\w] against "RRRRRRRRR" (9 bytes)
   1 <R> <RRRRRRRRR>         |  1:POSIXD[\w](2)
   2 <RR> <RRRRRRRR>         |  2:END(0)
Match successful!
...
[download]

I'm not sure how you could make Perl try not to scan the string for the fixed substring R but immediately look for \G\d to immediately reject the parse at that location, no matter what follows.

By splitting up the parsing between \d+ and R, I can make both cases behave ~~slightly~~ 18% worse than \G\d+R, but I'm not sure if that's correct:

use strict;
use warnings;
use Benchmark qw/ cmpthese /;

my $R = 'R' x 42_000;
my $Q = 'Q' x 42_000;

sub foo {
    my $c = shift;

    pos $$c = 0;
    while ( 1 ) {
        next if $$c =~ /\G\d+R/;
        last unless $$c =~ /\w/g;
    }
}

sub foo_twostep {
    my $c = shift;

    pos $$c = 0;
    while ( 1 ) {
        next if ($$c =~ /\G[0-9]+/ and $$c =~ /\GR/);
        last unless $$c =~ /\w/g;
    }
}

cmpthese -3, { 
    R => sub { foo( \$R )},
    Q => sub { foo( \$Q )},
    R_twostep => sub { foo_twostep( \$R )},
    Q_twostep => sub { foo_twostep( \$Q )},
}
__END__
            Rate         Q R_twostep Q_twostep         R
Q         2.01/s        --      -97%      -97%      -97%
R_twostep 65.3/s     3147%        --        0%      -15%
Q_twostep 65.3/s     3147%        0%        --      -15%
R         77.1/s     3732%       18%       18%        --
[download]

Update: Using only ASCII digits, I can get a slowdown of 12%:

use strict;
use warnings;
use Benchmark qw/ cmpthese /;

my $R = 'R' x 42_000;
my $Q = 'Q' x 42_000;

sub foo {
    my $c = shift;

    pos $$c = 0;
    while ( 1 ) {
        next if $$c =~ /\G\d+R/;
        last unless $$c =~ /\w/g;
    }
}

sub foo_twostep {
    my $c = shift;

    pos $$c = 0;
    while ( 1 ) {
        next if ($$c =~ /\G\d+/ and $$c =~ /\GR/);
        last unless $$c =~ /\w/g;
    }
}

sub foo_asciidigits {
    my $c = shift;

    pos $$c = 0;
    while ( 1 ) {
        next if ($$c =~ /\G[0-9]/ and $$c =~ /\GR/);
        last unless $$c =~ /\w/g;
    }
}

cmpthese -3, { 
    R => sub { foo( \$R )},
    Q => sub { foo( \$Q )},
    R_twostep => sub { foo_twostep( \$R )},
    Q_twostep => sub { foo_twostep( \$Q )},
    R_ascii => sub { foo_asciidigits( \$R )},
    Q_ascii => sub { foo_asciidigits( \$Q )},
}

__END__
            Rate         Q Q_twostep R_twostep   R_ascii   Q_ascii    
+     R
Q         1.99/s        --      -97%      -97%      -97%      -97%    
+  -97%
Q_twostep 62.9/s     3054%        --       -1%       -7%       -8%    
+  -17%
R_twostep 63.8/s     3101%        1%        --       -5%       -6%    
+  -16%
R_ascii   67.5/s     3287%        7%        6%        --       -1%    
+  -11%
Q_ascii   68.1/s     3319%        8%        7%        1%        --    
+  -10%
R         76.0/s     3715%       21%       19%       13%       12%    
+    --
[download]

In reply to Re: Why is Perl suddenly slow in THIS case? by Corion
in thread Why is Perl suddenly slow in THIS case? by vr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.