Re: Reducing memory footprint when doing a lookup of millions of coordinates

On my measly VM (256MB memory) running PostgreSQL 8.2 on FreeBSD 6.4, it took less than 8 minutes to load the entire file into a table. Then, given:

  chromosome = chr2
  starting   = 21000
  ending     = 23200
[download]

returns the overlaps in less than half a second.

$ time perl -alne 'print join q{,} => @F[4,5,6,9] if $. > 3' hg19.fa.o
+ut | psql -qc "
  CREATE TABLE tt (chromosome text, starting integer, ending integer, 
+feature text);
  COPY tt (chromosome, starting, ending, feature) FROM STDIN WITH CSV;
  CREATE INDEX idx_tt_chromosome ON tt (chromosome);
"
perl -alne 'print join q{,} => @F[4,5,6,9] if $. > 3' hg19.fa.out  67.
+31s user 82.26s system 71% cpu 3:29.33 total
psql -qc   0.27s user 5.68s system 2% cpu 4:15.92 total


pgsql=> EXPLAIN ANALYZE
pgsql-> SELECT *
pgsql-> FROM
pgsql->   tt                                            
pgsql-> WHERE
pgsql->   chromosome = 'chr2'
pgsql->   AND 21000 <= ending
pgsql->   AND starting <= 23200;
                                                               QUERY P
+LAN                                                                
----------------------------------------------------------------------
+-------------------------------------------------------------------
 Bitmap Heap Scan on tt  (cost=516.43..38178.20 rows=2945 width=72) (a
+ctual time=129.490..341.722 rows=2 loops=1)
   Recheck Cond: (chromosome = 'chr2'::text)
   Filter: ((21000 <= ending) AND (starting <= 23200))
   ->  Bitmap Index Scan on idx_tt_chromosome  (cost=0.00..515.69 rows
+=26508 width=0) (actual time=128.165..128.165 rows=414622 loops=1)
         Index Cond: (chromosome = 'chr2'::text)
 Total runtime: 341.955 ms


pgsql=> SELECT *
pgsql-> FROM
pgsql->   tt                                            
pgsql-> WHERE
pgsql->   chromosome = 'chr2'
pgsql->   AND 21000 <= ending
pgsql->   AND starting <= 23200;

 chromosome | starting | ending | feature 
------------+----------+--------+---------
 chr2       |    21258 |  21370 | MIRb
 chr2       |    22095 |  23394 | L1PA14
(2 rows)
[download]

Comment on Re: Reducing memory footprint when doing a lookup of millions of coordinates Select or Download Code