I'm finding that most databases are not happy with these sizes. I use java on the server so I don't know the perl equivalant, but what works for fast access (web application) is to leave the tsv alone, then build binary index files for the queried fields. Each column gets a subdirectory, and each value gets a file which is a list of 64 bit numbers into the original tsv for the corresponding record for that value. If your filesystem can handle it (ext3?) then this works for lots of unique values, and even range searches, and of course you can sort the filenames to get the results back in a certain order. Multi column queries are handled by using the intersection(AND) or union(OR) of a list of pointers. Putting a little effort into figuring out which column/value is the smallest for a starting point helps with AND. Once you have your final list of pointers, you can use randomaccessfile and fetch the corresponding records quickly and add them to the response. I know this sounds like building from scratch, but search engines use a similiar technique. And I have spent far less time doing it the right way than softening my head on various dbms's and related nuances. And it is very memory friendly and fast. This works well for query applications, and I rebuild 7 column indexes on a 20gig tsv file and it is good to go, so putting up an updated tsv is fairly trivial too.

In reply to Re: Better way to work with large TSV files? by Anonymous Monk
in thread Better way to work with large TSV files? by radiantmatrix

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.