You could turn the problem inside out: load the test values into memory then scan the large reference file one line at a time to perform the matching:
#!/usr/bin/perl use strict; my $reps = <<REPS; chr1 100 120 feature1 chr1 200 250 feature2 chr2 150 200 feature1 chr2 280 350 feature1 chr3 100 150 feature2 chr3 300 450 feature2 REPS my %tests; while (my $line = <DATA>) { $line =~ s/[\n\r]//g; my @array = split /\s+/, $line; $tests{$array[0]}{$array[1]}{'end'} = $array[2]; $tests{$array[0]}{$array[1]}{'rep'} = $array[3]; } open my $repIn, '<', \$reps; while (<$repIn>) { my ($chr, $start, $end, $rep) = split ' '; next if !exists $tests{$chr}; for my $s (keys %{$tests{$chr}}) { if ($start <= $tests{$chr}{$s}{'end'}) { last if $s >= $end; print "$chr $start $end $rep\n"; } } } __DATA__ chr2 160 210
In reply to Re: Reducing memory footprint when doing a lookup of millions of coordinates
by GrandFather
in thread Reducing memory footprint when doing a lookup of millions of coordinates
by richardwfrancis
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |