in reply to Re^2: Filtering very large files using Tie::File
in thread Filtering very large files using Tie::File

See tie and DB_File. A tie'd hash simply moves the storage of the hash onto disk, at the (rather huge) cost of access speed.

A hash is a data structure optimized for fast lookup by a key value. An array can only look up data fast by its index, and the array assumes that all index values are sequential. You haven't told us whether that's the case, so I'm using a hash.

For the "postfix increment" operator ("++"), see perlop. It is basically $seen{ $key } = $seen{ $key } + 1, except shorter.

Replies are listed 'Best First'.
Re^4: Filtering very large files using Tie::File
by elef (Friar) on Nov 26, 2010 at 18:59 UTC
    Thanks, that clears up most things.
    I did know how post-increment works on numerical scalars, but this sort of use is new to me, and the perlop page says nothing about adding new records to a hash with ++... But this is what it seems to do.

    Your code works a treat, and the memory use doesn't seem to be too bad. Thanks.

      the perlop page says nothing about adding new records to a hash with ++
      This commonly seen Perl idiom works due to Autovivification (the automatic creation of a variable reference when an undefined value is dereferenced). Autovivification is unique to Perl; in other languages you'd need to create the item as a separate operation before incrementing it.

        Actually, no references come into play here. I'm simply incrementing the undefined value of $seen{ $key } by (and to) 1.