open IN2, $first_tmp;
while(<IN2>)
{
if($_=~/^(chr.*?)REC:(.*)/)
{
$respective_chrom=$1;
$all_entries=$2;
@split_entries=();
@split_entries = split(/\#/, $all_entries);
@split_sep_entries=();
%collapsed_loci_HoA=();
print ">".$respective_chrom."\n";
foreach $sep_entry(@split_entries)
{
@split_sep_entries = split(/\t/, $sep_entry);
$locus_to_use = $split_sep_entries[1];
$rest_entry=$split_sep_entries[0]."\t".$split_sep_entries[
+2]."\t".
$split_sep_entries[3]."\t".$split_sep_entries[
+4]."\t".
$split_sep_entries[5]."\t".$split_sep_entries[
+6]."\t".
$split_sep_entries[7];
push @{ $collapsed_loci_HoA{$locus_to_use} }, $rest_entry;
}
@array_of_loci = keys %collapsed_loci_HoA;
for $b(sort { $b <=> $a } @array_of_loci)
{
$count_arr++;
}
print "//\n";
}
}
close IN2;
and basically I am now getting my numbers sorted, as I posted above...
What I cannot do is exactly this binning you propose, my thoughts are to slice each time one element of the array and, if it is within the range, push it to the sub-array of the element that created it, but I really can't see how to do that.
I am new to Perl and I am literally stuck.. | [reply] [d/l] |
My approach to binning would be simple. You look at the first element of the array @split_entries and the index of the potential candidates, and increase that index until the potential candidate is larger than your distance. All elements between the first element and the index of the potential candidate then belong into one bin.
An example, for a distance of 5:
11
12
16
17
22
30
First you look at the first position in your array (11). The next candidate is at the second position, and its value is 12. abs(12-11) < 5, so you increase the index of your candidate. The next candidate is at the third position, and its value is 16. abs(16-11) >= 5, so your first bin are the first and second entries in the array, 11 and 12.
Now, you start the same thing over, as there are still elements in your array after removing 11 and 12 from it.
You look at the first position in your array (16). The next candidate is at the second position, and its value is 17. abs(16-17) < 5, so you increase the index of your candidate. The next candidate is at the third position, and its value is 22. abs(22-16) >= 5, so your first bin are the first and second entries in the array, 16 and 17.
... and so on. | [reply] [d/l] [select] |
Fair enough, but what kind of data structures will I need? This I cannot seem to figure out...
| [reply] |
open IN2, $first_tmp;
while(<IN2>)
{
if($_=~/^(chr.*?)REC:(.*)/)
{
$respective_chrom=$1;
$all_entries=$2;
@split_entries=();
@split_entries = split(/\#/, $all_entries);
@split_sep_entries=();
%collapsed_loci_HoA=();
print ">".$respective_chrom."\n";
foreach $sep_entry(@split_entries)
{
@split_sep_entries = split(/\t/, $sep_entry);
$locus_to_use = $split_sep_entries[1];
$rest_entry=$split_sep_entries[0]."\t".$split_sep_entries[
+2]."\t".
$split_sep_entries[3]."\t".$split_sep_entries[
+4]."\t".
$split_sep_entries[5]."\t".$split_sep_entries[
+6]."\t".
$split_sep_entries[7];
#print $locus_to_use."##".$rest_entry;
push @{ $collapsed_loci_HoA{$locus_to_use} }, $rest_entry;
}
$count_arr=0;
@array_of_loci = keys %collapsed_loci_HoA;
for $b(sort { $b <=> $a } @array_of_loci)
{
print "$b"."\n";
}
print "//\n";
}
}
close IN2;
Now it is printing the numbers sorted. | [reply] [d/l] |