Re^4: dice's coefficient

unpack is even faster, even with the need to calculate the $n repeat count. (There's probably a way to get rid of this calculation, but I can't see it at the moment.)

>perl -wMstrict -le
"use Benchmark qw(cmpthese);
 use Test::More 'no_plan';
 ;;
 my $str = 'wwibblewibblewibblewibbleibblewibblewibblewibble';
 ;;
 cmpthese -1, {
   regex  => sub { () = $str =~ /(?=(..))/g },
   substr => sub {
     () = map { substr $str, $_, 2 } (0 .. length($str) - 2)
     },
   unpack => sub {
     my $n = length($str) ? length($str) - 1 : 0;
     () = unpack qq{(a2 X)$n}, $str;
     },
   };
 ;;
 sub bigrams {
   my $n = length($_[0]) ? length($_[0]) - 1 : 0;
   return unpack qq{(a2 X)$n}, $_[0];
   }
 ;;
 is_deeply [ bigrams('')      ], [];
 is_deeply [ bigrams('a')     ], [];
 is_deeply [ bigrams('ab')    ], [ qw(ab)          ];
 is_deeply [ bigrams('abc')   ], [ qw(ab bc)       ];
 is_deeply [ bigrams('abcd')  ], [ qw(ab bc cd)    ];
 is_deeply [ bigrams('abcde') ], [ qw(ab bc cd de) ];
"
          Rate  regex substr unpack
regex  11934/s     --   -34%   -66%
substr 18066/s    51%     --   -48%
unpack 34816/s   192%    93%     --
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
1..6
[download]

Comment on Re^4: dice's coefficient Select or Download Code

Replies are listed 'Best First'.
Re^5: dice's coefficient by Anonymous Monk on Jan 15, 2012 at 02:27 UTC
It works well in that benchmark, but falls down here: `use Benchmark qw(cmpthese); my $str = "wwibblewibblewibblewibbleibblewibblewibblewibble"; cmpthese -1, { regex => sub { my %count; ++$count{$_} for $str =~ /(?=(..))/g; }, substr => sub { my %count; ++$count{substr $str, $_, 2} for (0 .. length($str) - 2); }, unpack => sub { my %count; my $n = length($str) - 1; ++$count{$_} for unpack qq{(a2 X)$n}, $str; }, }; Rate regex unpack substr regex 15316/s -- -43% -74% unpack 26935/s 76% -- -54% substr 58514/s 282% 117% --` [download] substr slows down 50% if the string contains utf-8 characters, but it's still significantly faster than unpack in this benchmark: `my $str = "wwibblewibblewibblewibbleibblewibblewibblewibble\x{20ac}"; Rate regex unpack substr regex 14222/s -- -35% -51% unpack 21976/s 55% -- -24% substr 29020/s 104% 32% --` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: dice's coefficient
by Anonymous Monk on Jan 15, 2012 at 02:27 UTC

use Benchmark qw(cmpthese);

my $str = "wwibblewibblewibblewibbleibblewibblewibblewibble";

cmpthese -1, {
    regex => sub {
        my %count;
        ++$count{$_} for $str =~ /(?=(..))/g;
    },
    substr => sub {
        my %count;
        ++$count{substr $str, $_, 2} for (0 .. length($str) - 2);
    },
    unpack => sub {
        my %count;
        my $n = length($str) - 1;
        ++$count{$_} for unpack qq{(a2 X)$n}, $str;
    },
};

          Rate  regex unpack substr
regex  15316/s     --   -43%   -74%
unpack 26935/s    76%     --   -54%
substr 58514/s   282%   117%     --
[download]

my $str = "wwibblewibblewibblewibbleibblewibblewibblewibble\x{20ac}";

          Rate  regex unpack substr
regex  14222/s     --   -35%   -51%
unpack 21976/s    55%     --   -24%
substr 29020/s   104%    32%     --
[download]

[reply]
[d/l]
[select]