in reply to Reducing memory footprint when doing a lookup of millions of coordinates
You could turn the problem inside out: load the test values into memory then scan the large reference file one line at a time to perform the matching:
#!/usr/bin/perl use strict; my $reps = <<REPS; chr1 100 120 feature1 chr1 200 250 feature2 chr2 150 200 feature1 chr2 280 350 feature1 chr3 100 150 feature2 chr3 300 450 feature2 REPS my %tests; while (my $line = <DATA>) { $line =~ s/[\n\r]//g; my @array = split /\s+/, $line; $tests{$array[0]}{$array[1]}{'end'} = $array[2]; $tests{$array[0]}{$array[1]}{'rep'} = $array[3]; } open my $repIn, '<', \$reps; while (<$repIn>) { my ($chr, $start, $end, $rep) = split ' '; next if !exists $tests{$chr}; for my $s (keys %{$tests{$chr}}) { if ($start <= $tests{$chr}{$s}{'end'}) { last if $s >= $end; print "$chr $start $end $rep\n"; } } } __DATA__ chr2 160 210
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Reducing memory footprint when doing a lookup of millions of coordinates
by richardwfrancis (Beadle) on Feb 27, 2011 at 12:18 UTC |