Re^3: Is foreach split Optimized?

Interesting:

5.020002
         Rate  index  regex  split split2
index  1.17/s     --   -23%   -35%   -43%
regex  1.51/s    29%     --   -16%   -26%
split  1.79/s    53%    18%     --   -12%
split2 2.03/s    74%    35%    14%     --
[download]

split2 uses tr/// instead of regex s///...

use warnings;
use strict;
use Data::Dump qw/dd pp/;
use Benchmark qw/cmpthese/;

# example output:
# 5.024001
#         Rate regex index split
# regex 9.64/s    --   -8%  -41%
# index 10.5/s    9%    --  -36%
# split 16.4/s   70%   56%    --

my $str = "\nFoo\n\nBar Quz\nBaz\nx" x 50000;

use constant TEST => 0;
my $expect = join "\0", split /\n/, $str;
$expect=~s/o/i/g;
#dd [split /\n/, $str], $expect;

dd $];
cmpthese(-5, {
    split => sub {
        my @lines;
        my @x = split /\n/, $str;
        #@x = map {$_} @x; # significant slowdown
        #for my $line (map {$_} split /\n/, $str) { # still fairly fas
+t
        for my $line (@x) {
            $line=~s/o/i/g;
            push @lines, $line;
        }
        if (TEST) { die pp(@lines) unless $expect eq join "\0", @lines
+ }
    },
    split2 => sub {
            my @lines;
            my @x = split "\n", $str; ### couple of percent
            #@x = map {$_} @x; # significant slowdown
            #for my $line (map {$_} split /\n/, $str) { # still fairly
+ fast
            for my $line (@x) {
                $line =~ tr/o/i/; ### big difference vs regex s///
                push @lines, $line;
            }
            if (TEST) { die pp(@lines) unless $expect eq join "\0", @l
+ines }
    },
    regex => sub {
        my @lines;
        pos($str)=0;
        #while ($str=~/^(.*)$/mgc) { # slower
        while ($str=~/\G(?|(.*?)\n|(.+)\z)/gc) {
            my $line = $1;
            $line=~s/o/i/g;
            push @lines, $line;
        }
        if (TEST) {
            die unless pos($str)==length($str);
            die pp(@lines) unless $expect eq join "\0", @lines;
        }
    },
    index => sub {
        my @lines;
        for ( my ($pos,$nextpos) = (0); $pos<length($str);
            $pos=$nextpos+1 ) {
                $nextpos = do { my $i=index($str,"\n",$pos);
                    $i<0?length($str):$i };
                my $line = substr $str, $pos, $nextpos-$pos;
                $line=~s/o/i/g;
                push @lines, $line
        }
        if (TEST) { die pp(@lines) unless $expect eq join "\0", @lines
+ }
    },
});
[download]

Comment on Re^3: Is foreach split Optimized? Select or Download Code

Replies are listed 'Best First'.
Re^4: Is foreach split Optimized? by haukex (Archbishop) on Jul 09, 2017 at 12:27 UTC
I'm investigating the speed of iterating through the string, not how it gets processed afterwards, which is why the loop bodies are all the same. So to make the benchmark be fair again, you'd have to make the same change in all the loops. I just added that bit of `s///` code to provide a somewhat realistic loop body (Update: or rather, a placeholder for the actual loop body, which manipulates the `$line` a whole lot more and whose output is much more complex than one line of output per line of input), and as a (likely misguided) attempt to prevent `for my $line (@x) { push @lines, $line; }` from being optimized to `@lines = @x;`.	[reply] [d/l] [select]
Re^5: Is foreach split Optimized? by Marshall (Canon) on Jul 09, 2017 at 13:02 UTC
Fair enough. With that change for more "apples to apples".. `5.020002 Rate index regex split2 index 1.27/s -- -19% -34% regex 1.57/s 23% -- -19% split2 1.93/s 52% 23% --` [download]	[reply] [d/l]