in reply to Masking part of a string

You're dealing with characters (or octects), not bits, so you need to set 8 bits in the mask instead of 1 for each letter. substr is a good alternative to vec for doing that. Perl even has Bitwise String Operators (see perlop) that allow you to do bit operations on entire strings "at once".

my $str = 'AGACGAGTA'; my $mask = "\xFF" x length($str); substr($mask, $_, 1, "\x00") for 2..6; my $res = $str & $mask; $res =~ s/\x00/x/g; print("$res\n");

You can avoid the regexp call:

my $str = 'AGACGAGTA'; my $mask = "\xFF" x length($str); substr($mask, $_, 1, "\x00") for 2..6; my $x = 'x' x length($str); my $res = ($str & $mask) | ($x & ~$mask); print("$res\n");

Of course, you really don't need the mask at all.

my $str = 'AGACGAGTA'; my $res = $str; substr($res, $_, 1, "x") for 2..6; print("$res\n");

Update: Added the last two snippets.

Replies are listed 'Best First'.
Re^2: Masking part of a string
by johngg (Canon) on Jun 27, 2007 at 14:36 UTC
    I've had a go at running a benchmark. If I've not made a mess of it, the pure masking solution seems to be by far the quickest.

    use strict; use warnings; use Benchmark q{cmpthese}; my $str = q{AGACGAGTA} x 12000; my $mask = qq{\xFF} x length $str; my $x = q{x} x length $str; my @toMask = ( 2 .. 6, 78, 506 .. 1473, 4863, 26290 .. 37907, 107889 .. 107996); substr $mask, $_, 1, qq{\x00} for @toMask; my $resAndOrAndNot = andOrAndNot(); my $resAndRegex = andRegex(); my $resSubstrList = substrList(); die qq{Inconsistent\n} unless $resAndOrAndNot eq $resAndRegex and $resAndOrAndNot eq $resSubstrList; cmpthese (-3, { andOrAndNot => \&andOrAndNot, andRegex => \&andRegex, substrList => \&substrList, }); sub andOrAndNot { my $masked = ($str & $mask) | ($x & ~ $mask); return $masked; } sub andRegex { my $masked = $str & $mask; $masked =~ s{\x00}{x}g; return $masked; } sub substrList { my $masked = $str; substr $masked, $_, 1, q{x} for @toMask; return $masked; }

    produces

    Rate substrList andRegex andOrAndNot substrList 26.0/s -- -8% -91% andRegex 28.3/s 9% -- -90% andOrAndNot 278/s 969% 884% --

    Cheers,

    JohnGG

      A relevant benchmark would factor in the time to build the mask while preserving the real application's ratio of mask rebuilds to the number of operations found. Right now, you ignore the time to build the mask in two tests, while including it in the third test.

      Depending on that ratio, substrList is faster than andOrAndNot.

        Yes, changing the benchmark to rebuild the mask for each operation does show substrList coming out on top. Here's the amended code (I hope I've got it right this time)

        and the output

        Rate andRegex andOrAndNot substrList andRegex 12.8/s -- -43% -50% andOrAndNot 22.6/s 77% -- -11% substrList 25.4/s 99% 12% --

        Cheers,

        JohnGG