Re: substr help
by BrowserUk (Patriarch) on May 12, 2004 at 17:24 UTC
|
#! perl -slw
use strict;
my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca';
my $n = int( ( length( $dna ) - ( 10 - 3 ) ) / 3 );
print for unpack "(A10 X7)$n", $dna;
__END__
C:\Perl\test>test
accatgagct
atgagctgta
agctgtacgt
tgtacgtagc
acgtagcatc
tagcatctga
catctgagcg
ctgagcgcgc
agcgcgcatg
gcgcatgact
catgactgtg
gactgtgact
tgtgactgac
gactgacgta
tgacgtaggc
cgtaggcagc
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |
Re: substr help
by davido (Cardinal) on May 12, 2004 at 15:58 UTC
|
my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca';
my $increment = 3;
my @windows;
for ( my $loc=0; $loc <= (length($dna)-10); $loc+=$increment ){
push @windows, substr($dna, $loc, 10);
}
print "$_\n" for @windows;
If you have multiple $dna sequences you'll probably want an outer loop to iterate over an array holding them. Otherwise, this code ought to do what you're looking for.
It's one of the few instances where I would actually use a C-style 'for' loop.
You could also do it with a regexp.
Update: Replaced (length($dna)-$increment) with (length($dna)-10) per duff's comment. Good catch!
| [reply] [d/l] [select] |
|
|
for ( my $loc = 0; $loc <= length($dna) - 10; $loc += $increment ) {
Otherwise his last few strings won't be 10 characters long. Even though I have done this exact thing (sliding window with overlaps) in the past using a C-style for loop, I think I'd probably write it like this these days:
my $end = int((length($dna) - 10)/3);
for my $i (0..$end) {
push @windows, substr($dna,$i*3,10);
}
or more likely
my @windows = map { substr($dna,$_*3,10) } 0..int((length($dna)-10)/3)
+;
| [reply] [d/l] [select] |
|
|
You could also do it with a regexp.
Which would look something like:
my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca';
my $increment = 3;
my $substr = 10;
my @windows = $dna=~/(?=(.{$substr})).{$increment}/gs;
print "$_\n" for @windows;
| [reply] [d/l] |
Re: substr help
by blokhead (Monsignor) on May 12, 2004 at 16:10 UTC
|
As davido says, you can do this with a regex too. Either play with pos a little bit:
## capture 10 chars (with advancing), then move back 7:
while ( $dna =~ /(.{10})/g ) {
print "current window = $1\n";
pos $dna -= 7;
}
Or use a capture within a lookahead:
## capture next 10 chars without advancing, then advance by 3:
while ( $dna =~ /(?= (.{10}) ) .{3}/gx ) {
print "current window: $1\n";
}
For maintenance/readability reasons, you may be better off using a for loop and substr. I don't know, though; I kinda like the lookahead solution... it's cute.
| [reply] [d/l] [select] |
Re: substr help
by geekgrrl (Pilgrim) on May 12, 2004 at 17:50 UTC
|
if you want to use Bioperl, here's an option. I couldn't find a module that would do this automatically, but I bet there is one out there.
use strict;
use Bio::Seq;
my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca';
my $seq = Bio::Seq->new( -seq => $dna);
my $end = $seq->length -10;
my @windows;
for(my $i= 1; $i < $end; $i+=2) #increase by one codon each time.
{
push @windows, $seq->subseq($i, $i+9);
}
print join "\n", @windows;
| [reply] [d/l] |
Re: substr help
by sgifford (Prior) on May 12, 2004 at 16:02 UTC
|
It's not clear exactly where you expect your output to be. You're calling substr inside the loop, but throwing away the results, then putting the original string $dna into the @windows list. Also, the third argument to substr is the number of characters you want; using 0 will always return an empty string. And in your loop, you're starting at 10 and stopping when the position is greater than the number of elements in @dna, but there's only one element in that list, so the loop never executes.
I think something closer to what you mean is:
use constant MOVEMENT => 3;
use constant WINDOWSIZE => 10;
my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca';
my @windows=();
for (my $pos = 0; $pos <= (length($dna) - WINDOWSIZE); $pos += MOVEMEN
+T) {
push(@windows,substr($dna,$pos,WINDOWSIZE));
}
print "@windows\n";
| [reply] [d/l] [select] |
Re: substr help
by Zaxo (Archbishop) on May 12, 2004 at 21:24 UTC
|
my $dna = q(accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca);
my $skip = 10;
{
local ($/, $\) = (\10, "\n");
open my $chain, '<', \substr($dna, $skip) or die $!;
print while <$chain>;
close $chain or die $!;
}
That makes use of the PerlIO trick of opening a scalar as a file by putting a reference to it in open's filespec slot. Setting $/ to a reference to constant three makes any filehandle be read three characters at a time. Setting $\, the output record seperator, to newline inserts one after every print statement executed.
Update: Oops, I also misread. Not a solution to what was asked.
| [reply] [d/l] |
Re: substr help
by Not_a_Number (Prior) on May 12, 2004 at 16:13 UTC
|
Just another way to do it:
my $dna = 'accatgagctgtacgtagcatctgagcgcgcatgactgtgactgacgtaggcagca';
my @windows = unpack 'A3' x length $dna, $dna;
print "@windows";
Update:Sorry, ignore the above, I read the question too quickly.
dave
| [reply] [d/l] |
|
|
| [reply] |