Re^3: Filtering very large files using Tie::File

See tie and DB_File. A tie'd hash simply moves the storage of the hash onto disk, at the (rather huge) cost of access speed.

A hash is a data structure optimized for fast lookup by a key value. An array can only look up data fast by its index, and the array assumes that all index values are sequential. You haven't told us whether that's the case, so I'm using a hash.

For the "postfix increment" operator ("++"), see perlop. It is basically $seen{ $key } = $seen{ $key } + 1, except shorter.

Comment on Re^3: Filtering very large files using Tie::File Download Code

Replies are listed 'Best First'.
Re^4: Filtering very large files using Tie::File by elef (Friar) on Nov 26, 2010 at 18:59 UTC
Thanks, that clears up most things. I did know how post-increment works on numerical scalars, but this sort of use is new to me, and the perlop page says nothing about adding new records to a hash with ++... But this is what it seems to do. Your code works a treat, and the memory use doesn't seem to be too bad. Thanks.	[reply]
Re^5: Filtering very large files using Tie::File by eyepopslikeamosquito (Archbishop) on Nov 26, 2010 at 20:41 UTC
the perlop page says nothing about adding new records to a hash with ++ This commonly seen Perl idiom works due to Autovivification (the automatic creation of a variable reference when an undefined value is dereferenced). Autovivification is unique to Perl; in other languages you'd need to create the item as a separate operation before incrementing it.	[reply]
Re^6: Filtering very large files using Tie::File by Corion (Patriarch) on Nov 26, 2010 at 20:56 UTC
Actually, no references come into play here. I'm simply incrementing the undefined value of `$seen{ $key }` by (and to) 1.	[reply] [d/l]
Re^7: Filtering very large files using Tie::File by elef (Friar) on Nov 26, 2010 at 21:38 UTC
Re^8: Filtering very large files using Tie::File by ikegami (Patriarch) on Nov 27, 2010 at 21:28 UTC