in reply to Reducing memory footprint when doing a lookup of millions of coordinates

On my measly VM (256MB memory) running PostgreSQL 8.2 on FreeBSD 6.4, it took less than 8 minutes to load the entire file into a table. Then, given:
chromosome = chr2 starting = 21000 ending = 23200
returns the overlaps in less than half a second.

$ time perl -alne 'print join q{,} => @F[4,5,6,9] if $. > 3' hg19.fa.o +ut | psql -qc " CREATE TABLE tt (chromosome text, starting integer, ending integer, +feature text); COPY tt (chromosome, starting, ending, feature) FROM STDIN WITH CSV; CREATE INDEX idx_tt_chromosome ON tt (chromosome); " perl -alne 'print join q{,} => @F[4,5,6,9] if $. > 3' hg19.fa.out 67. +31s user 82.26s system 71% cpu 3:29.33 total psql -qc 0.27s user 5.68s system 2% cpu 4:15.92 total pgsql=> EXPLAIN ANALYZE pgsql-> SELECT * pgsql-> FROM pgsql-> tt pgsql-> WHERE pgsql-> chromosome = 'chr2' pgsql-> AND 21000 <= ending pgsql-> AND starting <= 23200; QUERY P +LAN ---------------------------------------------------------------------- +------------------------------------------------------------------- Bitmap Heap Scan on tt (cost=516.43..38178.20 rows=2945 width=72) (a +ctual time=129.490..341.722 rows=2 loops=1) Recheck Cond: (chromosome = 'chr2'::text) Filter: ((21000 <= ending) AND (starting <= 23200)) -> Bitmap Index Scan on idx_tt_chromosome (cost=0.00..515.69 rows +=26508 width=0) (actual time=128.165..128.165 rows=414622 loops=1) Index Cond: (chromosome = 'chr2'::text) Total runtime: 341.955 ms pgsql=> SELECT * pgsql-> FROM pgsql-> tt pgsql-> WHERE pgsql-> chromosome = 'chr2' pgsql-> AND 21000 <= ending pgsql-> AND starting <= 23200; chromosome | starting | ending | feature ------------+----------+--------+--------- chr2 | 21258 | 21370 | MIRb chr2 | 22095 | 23394 | L1PA14 (2 rows)