I put together an example in case you want to use PostgreSQL:
The file I used is available here:
ftp://ftp.ncbi.nih.gov/genbank/livelists
It's similar to yours; but it has three columns.
I unzipped it, and put it into postgres, in a table t; there are more than 223-million rows.
$ ls -lh GbAccList.0206.2011 -rw-rw-r-- 1 aardvark aardvark 4.6G Feb 8 17:21 GbAccList.0206.2011 $ head -n 3 GbAccList.0206.2011 AACY024124353,1,129566152 AACY024124495,1,129566175 AACY024124494,1,129566176 $ time < GbAccList.0206.2011 psql -qc " create table t (c text, i1 integer, i2 integer); copy t from stdin csv delimiter E',';" real 3m47.448s $ time echo " create index t_i2_idx on t (i2); analyze t;" | psql -q real 5m50.291s
Searches are now around a tenth of a millisecond:
# 5 'random' searches like: echo "explain analyze select * from t where i2 = $gi;" | psql
Just showing the timings of five searches:
Index Cond: (i2 = 2017697) Total runtime: 0.157 ms Index Cond: (i2 = 6895719) Total runtime: 0.109 ms Index Cond: (i2 = 3193323) Total runtime: 0.119 ms Index Cond: (i2 = 8319666) Total runtime: 0.091 ms Index Cond: (i2 = 1573171) Total runtime: 0.119 ms
Of course, performance depends on the hardware used.
(a similar problem/solution here: Re^3: sorting very large text files (slander))
In reply to Re: search a large text file
by erix
in thread search a large text file
by perl_lover_always
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |