in reply to Recovering Substrings to String with Gap

Update: This solution, though correct (I think), is unnecessarily complicated and slow. The idea of using bit vectors, which is ideal for counting the size of embedded substrings, is not particularly well-suited for this new problem. See the one in my follow-up to this node instead.

This is a straightforward adaptation of the solution proposed by Roy Johnson (and minimally tweaked by YT) to your earlier problem:

sub append_n { my ($str, $array) = @_; my $vec = ''; for (@$array) { my $ofs = 0; while ( ( my $idx = index $str, $_, $ofs ) > -1 ) { # Set bits at each matched location vec( $vec, $_, 1 ) = 1 for $idx .. $idx + length() - 1; $ofs = $idx + 1; } } # Count set bits vec( $vec, $_, 1 ) or substr( $str, $_, 1 ) = 'N' for 0 .. -1 + length $str; return $str; } __END__ GATTACGNNNNGCGCTCGNNNAACGGCA GATTACGAGNNNNNNNCGTGTAANNNNN NNNTACGAGTGGCGCTCGTGNNNNNNNN NNNNNNNNNNGGNNNNNNNNNNNNGGNN

Update: Edited the name of the sub to match the one given in the head node.

the lowliest monk

Replies are listed 'Best First'.
Re^2: Recovering Substrings to String with Gap
by tlm (Prior) on Apr 16, 2005 at 13:59 UTC

    ...but come to think of it, it is simpler to just fix your original version:

    sub append_n { my ( $str, $array ) = @_; my $nstring = "N" x length($str); foreach my $sbstr ( @$array ) { my $ofs = 0; while ( ( my $pos = index $str, $sbstr, $ofs ) > -1 ) { substr ($nstring, $pos, length ($sbstr)) = $sbstr; $ofs = $pos + 1; } } return $nstring; } __END__ GATTACGNNNNGCGCTCGNNNAACGGCA GATTACGAGNNNNNNNCGTGTAANNNNN NNNTACGAGTGGCGCTCGTGNNNNNNNN NNNNNNNNNNGGNNNNNNNNNNNNGGNN
    ...and about 3X faster too:
    Rate bitvec substr bitvec 18618/s -- -74% substr 70275/s 277% --
    In hindsight this is not surprising, since my (vec-based) solution does everything that yours does plus the unnecessary bit-vector stuff. Doh!

    the lowliest monk