Re^2: Faulty Control Structures?

Thanks for your input. Unfortunately, this won't work as there isn't a good selection of unique identifiers to use as Keys for the Hash. So, using the code you provide, I'd end up with 23 key-value combinations, when I need 330k :). A hash of arrays would work better, in that I could have the values appended to the arrays for each chromosome, but then getting the data out would be a bit of a nightmare. I will look to cleaning up the globals though, as I was being a bit lazy there :).
EDIT: Actually, 24 combinations, as there are both x and y to consider :).

Bioinformatics

Comment on Re^2: Faulty Control Structures?

Replies are listed 'Best First'.
Re^3: Faulty Control Structures? by Narveson (Chaplain) on Jan 29, 2008 at 04:55 UTC
You're right. I overlooked the statement label in one of your `next` statements. I could not have arrived at my misreading if I had been as aware as you are that there are only two dozen chromosomes. But what about your hash of arrays? Why would getting the data out be such a nightmare? Populating the hash of arrays: `open my $annotation_read_handle, '<', $annotation_file; my %annotations_for; while (my $ad = <$annotation_read_handle> ) { # read $an_chrom out of $ad my ($an_chrom, undef) = split(/\t/, $ad); # store for future lookups push @$annotations_for{$an_chrom}, $ad; } close $annotation_read_handle;` [download] Now read through the main data file and assign each chromosome number to `my $main_chrom`. `# look up the list of annotations relevant to the current chromoso +me my $annotations_ref = $annotations_for{$main_chrom}; # loop through just these annotations ILC: foreach my $ad (@$annotations_ref) { # ... }` [download] Of course—as other more enlightened commentators have already pointed out—the most important thing to optimize is the range_find subroutine.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Faulty Control Structures?
by Narveson (Chaplain) on Jan 29, 2008 at 04:55 UTC

You're right. I overlooked the statement label in one of your next statements. I could not have arrived at my misreading if I had been as aware as you are that there are only two dozen chromosomes.

But what about your hash of arrays? Why would getting the data out be such a nightmare?

Populating the hash of arrays:

open my $annotation_read_handle, '<', $annotation_file;
my %annotations_for;
while (my $ad = <$annotation_read_handle> ) {
    # read $an_chrom out of $ad
    my ($an_chrom, undef) = split(/\t/, $ad);
    
    # store for future lookups
    push @$annotations_for{$an_chrom}, $ad;
}
close $annotation_read_handle;
[download]

Now read through the main data file and assign each chromosome number to my $main_chrom.

    # look up the list of annotations relevant to the current chromoso
+me
    my $annotations_ref = $annotations_for{$main_chrom};
    # loop through just these annotations
    ILC: foreach my $ad (@$annotations_ref) { # ...
    }
[download]

Of course—as other more enlightened commentators have already pointed out—the most important thing to optimize is the range_find subroutine.

[reply]
[d/l]
[select]