If your DNA sequences don't contain any separators, the code looks okay to me.
I would optimize the processing in order to avoid wasting time and memory.
- You only need to hold two input lines in memory, since there are no dependencies to other lines.
- One random shuffle should be enough to make it random.
- I would check whether the input file contains an even number of lines, that is when retrieving a pair of lines, check first, if the DNA sequence line is present.
- $final_seq should be initialized to the empty string in order to avoid a warning on an 'undefined' value.
So in summary I would write it like this (untested!):
use strict;
use warnings;
use List::Util 'shuffle'; # Idea from http://www.perlmonks.org/?node_i
+d=199901
my $input = shift @ARGV;
open(IN, '<', $input) or die "Can't read multifasta input genomic DNA
+file $input : $!\n";
my $destination = $input."_1MBWindow_ListUtilshuffle.fasta";
open(OUT, '>', $destination) or die "Can't write to file $destination:
+ $!\n";
my $window = 1000000; # hard coded for shuffle window to be 1MB i.e 10
+^6
my ($seq_id, $seq);
# process every alternate line with ID (and its corresponding sequence
+ in next line)
while (defined($seq_id = <IN>) {
if (!defined($seq = <IN>) {
last;
}
chomp $seg_id;
chomp $seg;
my $final_seq = '';
for (my $i = 1; $i <= length $seq; $i += $window ) {
my $s = substr ($seq, $i - 1, $window);
my @temp_seq_array = split //, $s;
@temp_seq_array = shuffle @temp_seq_array; # using the List::U
+til module AND Shuffles EACH window!!!
my $rand_shuffled_seq = join ('', @temp_seq_array,);
$final_seq .= $rand_shuffled_seq; # concatenates the shuffled DNA
+seq to the 3' end of the previous 1MB fragment
}
print OUT $seq_id, "\n",$final_seq,"\n";
}
close IN;
close OUT;
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.