in reply to Re: Masking part of a string
in thread Masking part of a string

I've had a go at running a benchmark. If I've not made a mess of it, the pure masking solution seems to be by far the quickest.

use strict; use warnings; use Benchmark q{cmpthese}; my $str = q{AGACGAGTA} x 12000; my $mask = qq{\xFF} x length $str; my $x = q{x} x length $str; my @toMask = ( 2 .. 6, 78, 506 .. 1473, 4863, 26290 .. 37907, 107889 .. 107996); substr $mask, $_, 1, qq{\x00} for @toMask; my $resAndOrAndNot = andOrAndNot(); my $resAndRegex = andRegex(); my $resSubstrList = substrList(); die qq{Inconsistent\n} unless $resAndOrAndNot eq $resAndRegex and $resAndOrAndNot eq $resSubstrList; cmpthese (-3, { andOrAndNot => \&andOrAndNot, andRegex => \&andRegex, substrList => \&substrList, }); sub andOrAndNot { my $masked = ($str & $mask) | ($x & ~ $mask); return $masked; } sub andRegex { my $masked = $str & $mask; $masked =~ s{\x00}{x}g; return $masked; } sub substrList { my $masked = $str; substr $masked, $_, 1, q{x} for @toMask; return $masked; }

produces

Rate substrList andRegex andOrAndNot substrList 26.0/s -- -8% -91% andRegex 28.3/s 9% -- -90% andOrAndNot 278/s 969% 884% --

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^3: Masking part of a string
by ikegami (Patriarch) on Jun 27, 2007 at 15:02 UTC

    A relevant benchmark would factor in the time to build the mask while preserving the real application's ratio of mask rebuilds to the number of operations found. Right now, you ignore the time to build the mask in two tests, while including it in the third test.

    Depending on that ratio, substrList is faster than andOrAndNot.

      Yes, changing the benchmark to rebuild the mask for each operation does show substrList coming out on top. Here's the amended code (I hope I've got it right this time)

      and the output

      Rate andRegex andOrAndNot substrList andRegex 12.8/s -- -43% -50% andOrAndNot 22.6/s 77% -- -11% substrList 25.4/s 99% 12% --

      Cheers,

      JohnGG