in reply to Lower-casing Substrings and Iterating Two Files together

My idea is based upon using index, rindex and substr.

#!/usr/bin/perl use strict; use warnings; open SEQ, '<', 'data1.dat' or die $!; open MASK, '<', 'data2.dat' or die $!; while ( my $seq = <SEQ> ) { my $mask = <MASK>; my $start = index( $mask, 'N' ); my $length = rindex( $mask, 'N' ) - $start + 1; my $lc_seq = lc substr( $seq, $start, $length ); substr( $seq, $start, $length, $lc_seq ); print $seq; }

Updates:

  1. Important notice from kirillm. My solution doesn't cover all possible cases.
  2. fixed calculation error
  3. see my other post for my updated code.

Replies are listed 'Best First'.
Re^2: Lower-casing Substrings and Iterating Two Files together
by kirillm (Friar) on Dec 27, 2008 at 16:16 UTC

    Your proposal seem to assume that there will be one continuous sequence of the N characters in the mask. What if there are non-N characters between $start and $start+$length?

      Very good question.

      I didn't know about those cases. I only can consider those cases, which are mentioned as examples.

      Update: Here's an update, so it should consider that special case you mentioned. But I had to change the code and remove (r)index.

      #!/usr/bin/perl use strict; use warnings; open SEQ, '<', 'data1.dat' or die $!; open MASK, '<', 'data2.dat' or die $!; while ( my $seq = <SEQ> ) { my $mask = <MASK>; while ( $mask =~ m/(N+)/g ) { my $length = length $1; my $start = pos($mask) - $length; my $lc_seq = lc substr( $seq, $start, $length ); substr( $seq, $start, $length, $lc_seq ); } print $seq; } close MASK; close SEQ;