Thanks, this is starting to look a lot simpler than I imagined.

So I could just do
$in =~ /^([^\t]*\t[^\t]*)/; $key = $1;
and that would take care of doing the filtering based on the first two columns instead of the whole line, right?
But the %seen hash would hold the first two columns of all the unique lines in the file, which could be about 1GB. I'm not sure what you mean by tying the hash to disk, could you elaborate? Although I guess I'll have to test how much memory this takes up in reality and whether or not it causes a problem.

On a different note, I'm having a bit of trouble comprehending the code. First of all, I don't get why you need a hash, not just an array. What's the use of the key-value pairs here? Or is it just that it's easier to see if a certain element is present in a hash than doing a similar lookup in an array?
And what exactly does the ++ in if (! $seen{ $key }++) do? Add a new record to the %seen hash?

In reply to Re^2: Filtering very large files using Tie::File by elef
in thread Filtering very large files using Tie::File by elef

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.