in reply to Reducing memory footprint when doing a lookup of millions of coordinates
First, a question. Is it possible to have two (or more) features that start at the same position in the same chromosome, but end at different positions?
If so, your current data structure will only record the last one read from the file.
Assuming that's okay, then moving from using a HoHoHs to a HoHoAs:
#!/usr/bin/perl -slw use strict; use constant { END => 0, REP => 1 }; my %reps; while(<>){ chomp; my @array = split; $reps{ $array[0] }{ $array[1] } = [ $array[2], $array[3] ]; } my $start = 160; my $end = 210; my $chr = "chr2"; for my $s ( sort { $a <=> $b } keys %{ $reps{ $chr } } ){ if( $start <= $reps{ $chr }{ $s }[ END ] ) { last if $s >= $end; print "$chr $s $reps{ $chr }{ $s }[ END ] $reps{ $chr }{ $s }[ + REP ]\n"; } }
Will likely save you ~25% of your memory usage and run a little faster.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Reducing memory footprint when doing a lookup of millions of coordinates
by richardwfrancis (Beadle) on Feb 27, 2011 at 12:23 UTC | |
by BrowserUk (Patriarch) on Feb 27, 2011 at 12:40 UTC | |
by richardwfrancis (Beadle) on Feb 27, 2011 at 13:16 UTC | |
by BrowserUk (Patriarch) on Feb 27, 2011 at 13:40 UTC | |
|
Re^2: Reducing memory footprint when doing a lookup of millions of coordinates
by richardwfrancis (Beadle) on Feb 27, 2011 at 12:26 UTC |