in reply to substr help

my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca'; my $increment = 3; my @windows; for ( my $loc=0; $loc <= (length($dna)-10); $loc+=$increment ){ push @windows, substr($dna, $loc, 10); } print "$_\n" for @windows;

If you have multiple $dna sequences you'll probably want an outer loop to iterate over an array holding them. Otherwise, this code ought to do what you're looking for.

It's one of the few instances where I would actually use a C-style 'for' loop.

You could also do it with a regexp.

Update: Replaced (length($dna)-$increment) with (length($dna)-10) per duff's comment. Good catch!


Dave

Replies are listed 'Best First'.
Re: Re: substr help
by duff (Parson) on May 12, 2004 at 16:11 UTC

    Surely that should be

    for ( my $loc = 0; $loc <= length($dna) - 10; $loc += $increment ) {

    Otherwise his last few strings won't be 10 characters long. Even though I have done this exact thing (sliding window with overlaps) in the past using a C-style for loop, I think I'd probably write it like this these days:

    my $end = int((length($dna) - 10)/3); for my $i (0..$end) { push @windows, substr($dna,$i*3,10); }
    or more likely
    my @windows = map { substr($dna,$_*3,10) } 0..int((length($dna)-10)/3) +;
Re: Re: substr help
by ysth (Canon) on May 12, 2004 at 18:05 UTC
    You could also do it with a regexp.
    Which would look something like:
    my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca'; my $increment = 3; my $substr = 10; my @windows = $dna=~/(?=(.{$substr})).{$increment}/gs; print "$_\n" for @windows;