in reply to Re^6: Filtering very large files using Tie::File
in thread Filtering very large files using Tie::File

Ahh, so every key/value pair in the hash is (text from the file)/1? Or, I guess, (text)/(number of occurrences) because if the record is a duplicate, you'll be incrementing a preexisting value. I think I get it now.
  • Comment on Re^7: Filtering very large files using Tie::File

Replies are listed 'Best First'.
Re^8: Filtering very large files using Tie::File
by ikegami (Patriarch) on Nov 27, 2010 at 21:28 UTC

    The first time you see the line:

    • $seen{"abc\n"} doesn't exist, so it's effectively undef.
    • $seen{"abc\n"}++ increments $seen{"abc\n"} to 1 and returns the original value (undef).
    • "!" negates the value returned by the postincrement (undef), returning true.
    • The "if" body is entered.

    The second (or third, or fourth) time you see the line:

    • $seen{"abc\n"} was previously set to 1 (or 2, or 3).
    • $seen{"abc\n"}++ increments $seen{"abc\n"} to 2 (or 3, or 4) and returns the original value (1, or 2, or 3).
    • "!" negates the value returned by the postincrement (1, or 2 or 3), returning false.
    • The "if" body is not entered.

    It's just one of those useful patterns one memorises.