First, a question. Is it possible to have two (or more) features that start at the same position in the same chromosome, but end at different positions?
If so, your current data structure will only record the last one read from the file.
Assuming that's okay, then moving from using a HoHoHs to a HoHoAs:
#!/usr/bin/perl -slw use strict; use constant { END => 0, REP => 1 }; my %reps; while(<>){ chomp; my @array = split; $reps{ $array[0] }{ $array[1] } = [ $array[2], $array[3] ]; } my $start = 160; my $end = 210; my $chr = "chr2"; for my $s ( sort { $a <=> $b } keys %{ $reps{ $chr } } ){ if( $start <= $reps{ $chr }{ $s }[ END ] ) { last if $s >= $end; print "$chr $s $reps{ $chr }{ $s }[ END ] $reps{ $chr }{ $s }[ + REP ]\n"; } }
Will likely save you ~25% of your memory usage and run a little faster.
In reply to Re: Reducing memory footprint when doing a lookup of millions of coordinates
by BrowserUk
in thread Reducing memory footprint when doing a lookup of millions of coordinates
by richardwfrancis
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |