Re^4: Filtering very large files using Tie::File

Thanks, that clears up most things.
I did know how post-increment works on numerical scalars, but this sort of use is new to me, and the perlop page says nothing about adding new records to a hash with ++... But this is what it seems to do.

Your code works a treat, and the memory use doesn't seem to be too bad. Thanks.

Comment on Re^4: Filtering very large files using Tie::File

Replies are listed 'Best First'.
Re^5: Filtering very large files using Tie::File by eyepopslikeamosquito (Archbishop) on Nov 26, 2010 at 20:41 UTC
the perlop page says nothing about adding new records to a hash with ++ This commonly seen Perl idiom works due to Autovivification (the automatic creation of a variable reference when an undefined value is dereferenced). Autovivification is unique to Perl; in other languages you'd need to create the item as a separate operation before incrementing it.	[reply]
Re^6: Filtering very large files using Tie::File by Corion (Patriarch) on Nov 26, 2010 at 20:56 UTC
Actually, no references come into play here. I'm simply incrementing the undefined value of `$seen{ $key }` by (and to) 1.	[reply] [d/l]
Re^7: Filtering very large files using Tie::File by elef (Friar) on Nov 26, 2010 at 21:38 UTC
Ahh, so every key/value pair in the hash is (text from the file)/1? Or, I guess, (text)/(number of occurrences) because if the record is a duplicate, you'll be incrementing a preexisting value. I think I get it now.	[reply]
Re^8: Filtering very large files using Tie::File by ikegami (Patriarch) on Nov 27, 2010 at 21:28 UTC