in reply to Performance problems on splitting long strings

Just fyi:

use strict; use warnings; use Tie::CharArray; use Benchmark qw/cmpthese/; my $string = join '', 'A' .. 'Y'; sub _unpack { my @arr = unpack '(A5)*', $string; } sub _regex { my @arr = $string =~ /.{5}/g; } sub _split { my @arr = split /.{5}\K/, $string; } sub _substr { my @arr; for ( my $i = 0 ; $i < length $string ; $i += 5 ) { push @arr, substr $string, $i, 5; } } sub _open { my @arr; open my $sh, '<', \$string; while ( read $sh, my $chars, 5 ) { push @arr, $chars; } } cmpthese( -5, { _unpack => sub { _unpack() }, _regex => sub { _regex() }, _split => sub { _split() }, _substr => sub { _substr() }, _open => sub { _open() } } );

Output:

Rate _open _regex _substr _split _unpack _open 265986/s -- -53% -55% -57% -70% _regex 563780/s 112% -- -5% -8% -36% _substr 593788/s 123% 5% -- -3% -33% _split 612001/s 130% 9% 3% -- -31% _unpack 881949/s 232% 56% 49% 44% --

Replies are listed 'Best First'.
Re^2: Performance problems on splitting long strings
by SimonPratt (Friar) on Jan 31, 2014 at 15:39 UTC

    Borrowing heavily from Kenosis' code (thanks), regex seems to be faster than unpack (at least using substitution):

    Rate _substr _unpack _regex _split _substr 2187335/s -- -11% -16% -20% _unpack 2457294/s 12% -- -6% -10% _regex 2612321/s 19% 6% -- -4% _split 2726283/s 25% 11% 4% --

    Perl code:

    use strict; use warnings; use Benchmark qw/cmpthese/; my $string = join '', 'A' .. 'Y'; sub _unpack { my @arr = unpack '(A5)*', $string; } sub _regex { my @arr; while (length $string){ $string =~ s/^(.{5})//; push @arr, $1; } } sub _split { my @arr = split /.{5}\K/, $string; } sub _substr { my @arr; for ( my $i = 0 ; $i < length $string ; $i += 5 ) { push @arr, substr $string, $i, 5; } } cmpthese( -5, { _unpack => sub { _unpack() }, _split => sub { _split() }, _substr => sub { _substr() }, _regex => sub { _regex() } } );

      Your benchmark is totally broken.

      When your _regex() function runs the first time, it complete destroys $string; and everytime after that the regex is operating on an empty string and thus runs very quicly. Ditto, every other test that runs after the first run of _regex().


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.