Re: Stuck with manipulating an array
by Corion (Patriarch) on Aug 28, 2017 at 13:15 UTC
|
How about sorting the elements and binning them together until the distance between the first element in the bin and the next element is larger than 1000?
If ou want to optimize for a minimal number of bins or something like that, you will need to put more thought into that. There is lots of literature for "good" binning, but if you only care about the distance between elements, the naive approach should get you far.
If you show us the relevant code you already have, we can maybe give you more concrete advice.
| [reply] |
|
|
open IN2, $first_tmp;
while(<IN2>)
{
if($_=~/^(chr.*?)REC:(.*)/)
{
$respective_chrom=$1;
$all_entries=$2;
@split_entries=();
@split_entries = split(/\#/, $all_entries);
@split_sep_entries=();
%collapsed_loci_HoA=();
print ">".$respective_chrom."\n";
foreach $sep_entry(@split_entries)
{
@split_sep_entries = split(/\t/, $sep_entry);
$locus_to_use = $split_sep_entries[1];
$rest_entry=$split_sep_entries[0]."\t".$split_sep_entries[
+2]."\t".
$split_sep_entries[3]."\t".$split_sep_entries[
+4]."\t".
$split_sep_entries[5]."\t".$split_sep_entries[
+6]."\t".
$split_sep_entries[7];
push @{ $collapsed_loci_HoA{$locus_to_use} }, $rest_entry;
}
@array_of_loci = keys %collapsed_loci_HoA;
for $b(sort { $b <=> $a } @array_of_loci)
{
$count_arr++;
}
print "//\n";
}
}
close IN2;
and basically I am now getting my numbers sorted, as I posted above...
What I cannot do is exactly this binning you propose, my thoughts are to slice each time one element of the array and, if it is within the range, push it to the sub-array of the element that created it, but I really can't see how to do that.
I am new to Perl and I am literally stuck.. | [reply] [d/l] |
|
|
My approach to binning would be simple. You look at the first element of the array @split_entries and the index of the potential candidates, and increase that index until the potential candidate is larger than your distance. All elements between the first element and the index of the potential candidate then belong into one bin.
An example, for a distance of 5:
11
12
16
17
22
30
First you look at the first position in your array (11). The next candidate is at the second position, and its value is 12. abs(12-11) < 5, so you increase the index of your candidate. The next candidate is at the third position, and its value is 16. abs(16-11) >= 5, so your first bin are the first and second entries in the array, 11 and 12.
Now, you start the same thing over, as there are still elements in your array after removing 11 and 12 from it.
You look at the first position in your array (16). The next candidate is at the second position, and its value is 17. abs(16-17) < 5, so you increase the index of your candidate. The next candidate is at the third position, and its value is 22. abs(22-16) >= 5, so your first bin are the first and second entries in the array, 16 and 17.
... and so on. | [reply] [d/l] [select] |
|
|
|
|
|
|
open IN2, $first_tmp;
while(<IN2>)
{
if($_=~/^(chr.*?)REC:(.*)/)
{
$respective_chrom=$1;
$all_entries=$2;
@split_entries=();
@split_entries = split(/\#/, $all_entries);
@split_sep_entries=();
%collapsed_loci_HoA=();
print ">".$respective_chrom."\n";
foreach $sep_entry(@split_entries)
{
@split_sep_entries = split(/\t/, $sep_entry);
$locus_to_use = $split_sep_entries[1];
$rest_entry=$split_sep_entries[0]."\t".$split_sep_entries[
+2]."\t".
$split_sep_entries[3]."\t".$split_sep_entries[
+4]."\t".
$split_sep_entries[5]."\t".$split_sep_entries[
+6]."\t".
$split_sep_entries[7];
#print $locus_to_use."##".$rest_entry;
push @{ $collapsed_loci_HoA{$locus_to_use} }, $rest_entry;
}
$count_arr=0;
@array_of_loci = keys %collapsed_loci_HoA;
for $b(sort { $b <=> $a } @array_of_loci)
{
print "$b"."\n";
}
print "//\n";
}
}
close IN2;
Now it is printing the numbers sorted. | [reply] [d/l] |
Re: Stuck with manipulating an array
by tybalt89 (Monsignor) on Aug 28, 2017 at 13:36 UTC
|
#!/usr/bin/perl
# http://perlmonks.org/?node_id=1198153
use strict;
use warnings;
use Data::Dumper;
my $start;
my @answer;
for ( sort {$a <=> $b} map tr/\n//dr, <DATA> )
{
if( not defined $start or $_ > $start + 1000 )
{
push @answer, [ $_ ];
$start = $_;
}
else
{
push @{ $answer[-1] }, $_;
}
}
print Dumper \@answer;
__DATA__
141326478
103194415
86004442
86004438
86004437
86004434
86004431
85280835
85280834
85280832
53250112
50137387
50137382
50137380
29223108
25694155
17916134
| [reply] [d/l] |
|
|
seems to do the trick, thank you so much!
Is this an array of arrays what you are creating, correct?
| [reply] |
|
|
| [reply] |
|
|
|
|
|
Re: Stuck with manipulating an array
by BillKSmith (Monsignor) on Aug 28, 2017 at 14:34 UTC
|
You have not responded to Corion's comment about "good" binning. Is any valid solution "good enough"? Do you have additional criteria, but do not know how to specify them? Consider how you would want to divide the list of integers (0..1001). (Note that there are over 1000 possible solutions using two bins. Far more if more bins are allowed.)
| [reply] |
|
|
I think the answer/snippet provided by tybalt89 was exactly what I was after...
| [reply] |
|
|
Ok, I am basically stuck here:
#!/usr/bin/perl
use Data::Dumper;
while(<DATA>)
{
$all_numbers=$_;
chomp $all_numbers;
@vector=();
@vector = split(/\@/, $all_numbers);
$start;
@answer;
for ( sort {$a <=> $b} @vector)
{
if( not defined $start or $_ > $start + 1000 )
{
push @answer, [ $_ ];
$start = $_;
}
else
{
push @{ $answer[-1] }, $_;
}
}
for $i ( 0 .. $#answer )
{
print "$i\t [ @{$answer[$i]} ]\n";
}
print "//\n";
}
__DATA__
141326478@103194415@50137382@86004442@86004438@86004434@85280835@17916
+134@85280834@86004437@85280832@53250112@50137387@50137380@29223108@25
+694155@86004431
6901075@6901079@34073753@88911904@34073751@91346449@34073757
If I only have 1 line of data, it works perfectly, but If I have these 2, it creates this:
0 [ 17916134 ]
1 [ 25694155 ]
2 [ 29223108 ]
3 [ 50137380 50137382 50137387 ]
4 [ 53250112 ]
5 [ 85280832 85280834 85280835 ]
6 [ 86004431 86004434 86004437 86004438 86004442 ]
7 [ 103194415 ]
8 [ 141326478 ]
//
0 [ 17916134 ]
1 [ 25694155 ]
2 [ 29223108 ]
3 [ 50137380 50137382 50137387 ]
4 [ 53250112 ]
5 [ 85280832 85280834 85280835 ]
6 [ 86004431 86004434 86004437 86004438 86004442 ]
7 [ 103194415 ]
8 [ 141326478 6901075 6901079 34073751 34073753 34073757 88911904
+91346449 ]
//
What am I doing wrong? | [reply] [d/l] [select] |
|
|
Re: Stuck with manipulating an array
by salva (Canon) on Aug 28, 2017 at 13:16 UTC
|
well, forget about the computer, how would you solve the problem using your brain alone?
| [reply] |