in reply to Re^2: Is foreach split Optimized?
in thread Is foreach split Optimized? (Update: No.)

Interesting:
5.020002 Rate index regex split split2 index 1.17/s -- -23% -35% -43% regex 1.51/s 29% -- -16% -26% split 1.79/s 53% 18% -- -12% split2 2.03/s 74% 35% 14% --
split2 uses tr/// instead of regex s///...
use warnings; use strict; use Data::Dump qw/dd pp/; use Benchmark qw/cmpthese/; # example output: # 5.024001 # Rate regex index split # regex 9.64/s -- -8% -41% # index 10.5/s 9% -- -36% # split 16.4/s 70% 56% -- my $str = "\nFoo\n\nBar Quz\nBaz\nx" x 50000; use constant TEST => 0; my $expect = join "\0", split /\n/, $str; $expect=~s/o/i/g; #dd [split /\n/, $str], $expect; dd $]; cmpthese(-5, { split => sub { my @lines; my @x = split /\n/, $str; #@x = map {$_} @x; # significant slowdown #for my $line (map {$_} split /\n/, $str) { # still fairly fas +t for my $line (@x) { $line=~s/o/i/g; push @lines, $line; } if (TEST) { die pp(@lines) unless $expect eq join "\0", @lines + } }, split2 => sub { my @lines; my @x = split "\n", $str; ### couple of percent #@x = map {$_} @x; # significant slowdown #for my $line (map {$_} split /\n/, $str) { # still fairly + fast for my $line (@x) { $line =~ tr/o/i/; ### big difference vs regex s/// push @lines, $line; } if (TEST) { die pp(@lines) unless $expect eq join "\0", @l +ines } }, regex => sub { my @lines; pos($str)=0; #while ($str=~/^(.*)$/mgc) { # slower while ($str=~/\G(?|(.*?)\n|(.+)\z)/gc) { my $line = $1; $line=~s/o/i/g; push @lines, $line; } if (TEST) { die unless pos($str)==length($str); die pp(@lines) unless $expect eq join "\0", @lines; } }, index => sub { my @lines; for ( my ($pos,$nextpos) = (0); $pos<length($str); $pos=$nextpos+1 ) { $nextpos = do { my $i=index($str,"\n",$pos); $i<0?length($str):$i }; my $line = substr $str, $pos, $nextpos-$pos; $line=~s/o/i/g; push @lines, $line } if (TEST) { die pp(@lines) unless $expect eq join "\0", @lines + } }, });

Replies are listed 'Best First'.
Re^4: Is foreach split Optimized?
by haukex (Archbishop) on Jul 09, 2017 at 12:27 UTC

    I'm investigating the speed of iterating through the string, not how it gets processed afterwards, which is why the loop bodies are all the same. So to make the benchmark be fair again, you'd have to make the same change in all the loops. I just added that bit of s/// code to provide a somewhat realistic loop body (Update: or rather, a placeholder for the actual loop body, which manipulates the $line a whole lot more and whose output is much more complex than one line of output per line of input), and as a (likely misguided) attempt to prevent for my $line (@x) { push @lines, $line; } from being optimized to @lines = @x;.

      Fair enough. With that change for more "apples to apples"..
      5.020002 Rate index regex split2 index 1.27/s -- -19% -34% regex 1.57/s 23% -- -19% split2 1.93/s 52% 23% --