haukex has asked for the wisdom of the Perl Monks concerning the following question:
I think I've found an interesting optimization, but even after some digging here on PerlMonks, the Perl docs, and looking through the Camel I haven't yet found it documented*. Of course, it's also possible that I made a mistake in my benchmark, or that this optimization is common knowledge, in which case I would be happy to be enlightened :-)
<update> * Because it isn't there :-) It appears that the answer is that while split is quite fast, it still splits the string into an array before iterating over that array (that's what the memory consumption seems to show). The filehandle method proposed by Laurent_R seems to be the best way to go about my task instead, assuming what you're splitting on is a fixed string. See the all the replies below for more details. </update>
I had a multiline string and wanted to iterate through the lines, and became curious what the fastest way to do that was. I was pleasantly surprised that in my benchmark, on v5.24.1, for (split /\n/, $string) was fastest, despite my original worry that it might split the string into a long list before iterating over it.
I know that foreach (1..1000000) has been optimized to use an iterator instead of building a huge list since 5.005, and I found some discussion of the optimization of split // in this thread.
I also found several references to @x = split ... being optimized, which I think might be the reason that for (split ...) is so fast. I don't know how long this optimization has been present, but I found, for example, commit e4e95921cd0fd0, which seems to indicate it's been present since Perl 3. If anyone knows more and wants to set the record straight, please do so! :-)
use warnings; use strict; use Data::Dump qw/dd pp/; use Benchmark qw/cmpthese/; # example output: # 5.024001 # Rate regex index split # regex 9.64/s -- -8% -41% # index 10.5/s 9% -- -36% # split 16.4/s 70% 56% -- my $str = "\nFoo\n\nBar Quz\nBaz\nx" x 50000; use constant TEST => 0; my $expect = join "\0", split /\n/, $str; $expect=~s/o/i/g; #dd [split /\n/, $str], $expect; dd $]; cmpthese(-2, { split => sub { my @lines; my @x = split /\n/, $str; #@x = map {$_} @x; # significant slowdown #for my $line (map {$_} split /\n/, $str) { # still fairly fas +t for my $line (@x) { $line=~s/o/i/g; push @lines, $line; } if (TEST) { die pp(@lines) unless $expect eq join "\0", @lines + } }, regex => sub { my @lines; pos($str)=0; #while ($str=~/^(.*)$/mgc) { # slower while ($str=~/\G(?|(.*?)\n|(.+)\z)/gc) { my $line = $1; $line=~s/o/i/g; push @lines, $line; } if (TEST) { die unless pos($str)==length($str); die pp(@lines) unless $expect eq join "\0", @lines; } }, index => sub { my @lines; for ( my ($pos,$nextpos) = (0); $pos<length($str); $pos=$nextpos+1 ) { $nextpos = do { my $i=index($str,"\n",$pos); $i<0?length($str):$i }; my $line = substr $str, $pos, $nextpos-$pos; $line=~s/o/i/g; push @lines, $line } if (TEST) { die pp(@lines) unless $expect eq join "\0", @lines + } }, });
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Is foreach split Optimized?
by Laurent_R (Canon) on Jul 09, 2017 at 11:56 UTC | |
by haukex (Archbishop) on Jul 09, 2017 at 12:18 UTC | |
by Laurent_R (Canon) on Jul 09, 2017 at 14:17 UTC | |
by stevieb (Canon) on Jul 09, 2017 at 15:34 UTC | |
by haukex (Archbishop) on Jul 11, 2017 at 10:45 UTC | |
|
Re: Is foreach split Optimized?
by ikegami (Patriarch) on Jul 09, 2017 at 17:44 UTC | |
by haukex (Archbishop) on Jul 11, 2017 at 10:37 UTC | |
|
Re: Is foreach split Optimized?
by Marshall (Canon) on Jul 09, 2017 at 11:32 UTC | |
by haukex (Archbishop) on Jul 09, 2017 at 11:44 UTC | |
by Marshall (Canon) on Jul 09, 2017 at 11:57 UTC | |
by haukex (Archbishop) on Jul 09, 2017 at 12:27 UTC | |
by Marshall (Canon) on Jul 09, 2017 at 13:02 UTC | |
|
Re: Is foreach split Optimized?
by Anonymous Monk on Jul 09, 2017 at 20:05 UTC | |
by haukex (Archbishop) on Jul 11, 2017 at 10:28 UTC | |
| A reply falls below the community's threshold of quality. You may see it by logging in. |