Re: Lower-casing Substrings and Iterating Two Files together

My idea is based upon using index, rindex and substr.

#!/usr/bin/perl
use strict;
use warnings;

open SEQ,  '<', 'data1.dat' or die $!;
open MASK, '<', 'data2.dat' or die $!;

while ( my $seq = <SEQ> ) {
  my $mask = <MASK>;
  my $start  =  index( $mask, 'N' );
  my $length = rindex( $mask, 'N' ) - $start + 1;

  my $lc_seq = lc substr( $seq, $start, $length );

  substr( $seq, $start, $length, $lc_seq );

  print $seq;
}
[download]

Updates:

Important notice from kirillm. My solution doesn't cover all possible cases.
fixed calculation error
see my other post for my updated code.

Comment on Re: Lower-casing Substrings and Iterating Two Files together Download Code

Replies are listed 'Best First'.
Re^2: Lower-casing Substrings and Iterating Two Files together by kirillm (Friar) on Dec 27, 2008 at 16:16 UTC
Your proposal seem to assume that there will be one continuous sequence of the N characters in the mask. What if there are non-N characters between $start and $start+$length?	[reply]
Re^3: Lower-casing Substrings and Iterating Two Files together by linuxer (Curate) on Dec 27, 2008 at 16:26 UTC
Very good question. I didn't know about those cases. I only can consider those cases, which are mentioned as examples. Update: Here's an update, so it should consider that special case you mentioned. But I had to change the code and remove (r)index. `#!/usr/bin/perl use strict; use warnings; open SEQ, '<', 'data1.dat' or die $!; open MASK, '<', 'data2.dat' or die $!; while ( my $seq = <SEQ> ) { my $mask = <MASK>; while ( $mask =~ m/(N+)/g ) { my $length = length $1; my $start = pos($mask) - $length; my $lc_seq = lc substr( $seq, $start, $length ); substr( $seq, $start, $length, $lc_seq ); } print $seq; } close MASK; close SEQ;` [download]	[reply] [d/l]