in reply to Re^3: dice's coefficient
in thread dice's coefficient

unpack is even faster, even with the need to calculate the  $n repeat count. (There's probably a way to get rid of this calculation, but I can't see it at the moment.)

>perl -wMstrict -le "use Benchmark qw(cmpthese); use Test::More 'no_plan'; ;; my $str = 'wwibblewibblewibblewibbleibblewibblewibblewibble'; ;; cmpthese -1, { regex => sub { () = $str =~ /(?=(..))/g }, substr => sub { () = map { substr $str, $_, 2 } (0 .. length($str) - 2) }, unpack => sub { my $n = length($str) ? length($str) - 1 : 0; () = unpack qq{(a2 X)$n}, $str; }, }; ;; sub bigrams { my $n = length($_[0]) ? length($_[0]) - 1 : 0; return unpack qq{(a2 X)$n}, $_[0]; } ;; is_deeply [ bigrams('') ], []; is_deeply [ bigrams('a') ], []; is_deeply [ bigrams('ab') ], [ qw(ab) ]; is_deeply [ bigrams('abc') ], [ qw(ab bc) ]; is_deeply [ bigrams('abcd') ], [ qw(ab bc cd) ]; is_deeply [ bigrams('abcde') ], [ qw(ab bc cd de) ]; " Rate regex substr unpack regex 11934/s -- -34% -66% substr 18066/s 51% -- -48% unpack 34816/s 192% 93% -- ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 1..6

Replies are listed 'Best First'.
Re^5: dice's coefficient
by Anonymous Monk on Jan 15, 2012 at 02:27 UTC
    It works well in that benchmark, but falls down here:
    use Benchmark qw(cmpthese); my $str = "wwibblewibblewibblewibbleibblewibblewibblewibble"; cmpthese -1, { regex => sub { my %count; ++$count{$_} for $str =~ /(?=(..))/g; }, substr => sub { my %count; ++$count{substr $str, $_, 2} for (0 .. length($str) - 2); }, unpack => sub { my %count; my $n = length($str) - 1; ++$count{$_} for unpack qq{(a2 X)$n}, $str; }, }; Rate regex unpack substr regex 15316/s -- -43% -74% unpack 26935/s 76% -- -54% substr 58514/s 282% 117% --
    substr slows down 50% if the string contains utf-8 characters, but it's still significantly faster than unpack in this benchmark:
    my $str = "wwibblewibblewibblewibbleibblewibblewibblewibble\x{20ac}"; Rate regex unpack substr regex 14222/s -- -35% -51% unpack 21976/s 55% -- -24% substr 29020/s 104% 32% --