On my measly VM (256MB memory) running PostgreSQL 8.2 on FreeBSD 6.4, it took less than 8 minutes to load the entire file into a table. Then, given:
chromosome = chr2 starting = 21000 ending = 23200
returns the overlaps in less than half a second.

$ time perl -alne 'print join q{,} => @F[4,5,6,9] if $. > 3' hg19.fa.o +ut | psql -qc " CREATE TABLE tt (chromosome text, starting integer, ending integer, +feature text); COPY tt (chromosome, starting, ending, feature) FROM STDIN WITH CSV; CREATE INDEX idx_tt_chromosome ON tt (chromosome); " perl -alne 'print join q{,} => @F[4,5,6,9] if $. > 3' hg19.fa.out 67. +31s user 82.26s system 71% cpu 3:29.33 total psql -qc 0.27s user 5.68s system 2% cpu 4:15.92 total pgsql=> EXPLAIN ANALYZE pgsql-> SELECT * pgsql-> FROM pgsql-> tt pgsql-> WHERE pgsql-> chromosome = 'chr2' pgsql-> AND 21000 <= ending pgsql-> AND starting <= 23200; QUERY P +LAN ---------------------------------------------------------------------- +------------------------------------------------------------------- Bitmap Heap Scan on tt (cost=516.43..38178.20 rows=2945 width=72) (a +ctual time=129.490..341.722 rows=2 loops=1) Recheck Cond: (chromosome = 'chr2'::text) Filter: ((21000 <= ending) AND (starting <= 23200)) -> Bitmap Index Scan on idx_tt_chromosome (cost=0.00..515.69 rows +=26508 width=0) (actual time=128.165..128.165 rows=414622 loops=1) Index Cond: (chromosome = 'chr2'::text) Total runtime: 341.955 ms pgsql=> SELECT * pgsql-> FROM pgsql-> tt pgsql-> WHERE pgsql-> chromosome = 'chr2' pgsql-> AND 21000 <= ending pgsql-> AND starting <= 23200; chromosome | starting | ending | feature ------------+----------+--------+--------- chr2 | 21258 | 21370 | MIRb chr2 | 22095 | 23394 | L1PA14 (2 rows)

In reply to Re: Reducing memory footprint when doing a lookup of millions of coordinates by repellent
in thread Reducing memory footprint when doing a lookup of millions of coordinates by richardwfrancis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.