Re^7: Filtering very large files using Tie::File

Ahh, so every key/value pair in the hash is (text from the file)/1? Or, I guess, (text)/(number of occurrences) because if the record is a duplicate, you'll be incrementing a preexisting value. I think I get it now.

Comment on Re^7: Filtering very large files using Tie::File

Replies are listed 'Best First'.
Re^8: Filtering very large files using Tie::File by ikegami (Patriarch) on Nov 27, 2010 at 21:28 UTC
The first time you see the line: $seen{"abc\n"} doesn't exist, so it's effectively undef. $seen{"abc\n"}++ increments $seen{"abc\n"} to 1 and returns the original value (undef). "!" negates the value returned by the postincrement (undef), returning true. The "if" body is entered. The second (or third, or fourth) time you see the line: $seen{"abc\n"} was previously set to 1 (or 2, or 3). $seen{"abc\n"}++ increments $seen{"abc\n"} to 2 (or 3, or 4) and returns the original value (1, or 2, or 3). "!" negates the value returned by the postincrement (1, or 2 or 3), returning false. The "if" body is not entered. It's just one of those useful patterns one memorises.	[reply]

Replies are listed 'Best First'.

Re^8: Filtering very large files using Tie::File
by ikegami (Patriarch) on Nov 27, 2010 at 21:28 UTC

The first time you see the line:

$seen{"abc\n"} doesn't exist, so it's effectively undef.
$seen{"abc\n"}++ increments $seen{"abc\n"} to 1 and returns the original value (undef).
"!" negates the value returned by the postincrement (undef), returning true.
The "if" body is entered.

The second (or third, or fourth) time you see the line:

$seen{"abc\n"} was previously set to 1 (or 2, or 3).
$seen{"abc\n"}++ increments $seen{"abc\n"} to 2 (or 3, or 4) and returns the original value (1, or 2, or 3).
"!" negates the value returned by the postincrement (1, or 2 or 3), returning false.
The "if" body is not entered.

It's just one of those useful patterns one memorises.

[reply]