Result:
System sort: 59 minutes.
Postgres load (6m) + select (59m): 66 minutes
####
$ ls -lh dna2.txt
-rw-rw-r-- 1 xxxxx xxxxx 12G Dec 21 16:38 dna2.txt
1588 mm_ref_chrY.fa 2902494 R CTTGACTTTTATGTGTCACTGTGTTCAGTTCCTGGT
939 mm_ref_chrY.fa 2902495 F CCAGGAACTGACCACAGTGACACATAAGAGTCAAAT
742 mm_ref_chrY.fa 2902497 F AGGAACTGACCACAGTGACACATAAGAGTCAAATAG
1097 mm_ref_chrY.fa 2902497 F AGGAACTGACCACAGTGACACATAAGAGTCAAATAG
100 mm_ref_chrY.fa 2902499 F GAACTGACCACAGTGACACATAAAAGTCAAGTAGTG
1184 mm_ref_chrY.fa 2902499 F GAACTGACCACAGTGACACATAAAAGTCAAGTAGTG
286 mm_ref_chrY.fa 2902505 F ACCACAGTGCCACATAAAAGTCAAGTAGGGAATCCT
235 mm_ref_chrY.fa 2902513 R ACCAGGACAGGATCCCCTACTTGACTTTTATGTGGC
1744 mm_ref_chrY.fa 2902516 F ACATAAAAGTCAAGTAGTGGACCCTGTCCTGGTCTG
1029 mm_ref_chrY.fa 2902519 F TAAAAGTCAAGTAGGGGATCCTGTCCTGGTCTGGCA
#
# bash + sort version:
#
$ time sort -k5 dna2.txt > dna2.sortk5.out
real 59m48.641s
#
# Postgres version:
#
#
# loading the data:
#
$ time < dna2.txt \
psql -d test -c "
drop table if exists dna_test;
create table dna_test(
y integer,
chromosome text,
genomic_location integer,
direction text,
seq text);
copy dna_test from stdin csv delimiter E'\t';
" ;
real 6m20.430s
echo "
select to_char(count(*), '999G999G999') as rowcount
from dna_test" | psql -d test
rowcount
--------------
181,261,572
(1 row)
#
# querying a resultset into a sorted file:
#
$ time echo " copy (select * from dna_test order by seq) to stdout" \
| psql -1qtAd test > dna_test.order_by_seq.out
real 59m12.569s
##
##
$ time echo "
create index dna_test_seq_idx on dna_test (seq)
" | psql -d test
CREATE INDEX
real 63m20.451s
##
##
$ time echo "
explain analyze select * from dna_test order by seq
" | psql -1qtAd test > dna_test.order_by_seq.explain_analyze.txt
real 57m44.054s
$ cat dna_test.order_by_seq.explain_analyze.txt
Sort (cost=35550611.84..36003765.76 rows=181261568 width=62) (actual time=1834228.291..3428773.879 rows=181261572 loops=1)
Sort Key: seq
Sort Method: external merge Disk: 12933032kB
-> Seq Scan on dna_test (cost=0.00..3872406.68 rows=181261568 width=62) (actual time=17.966..65802.621 rows=181261572 loops=1)
Total runtime: 3463580.243 ms