in reply to Re: how to find differences between two huge files
in thread how to find differences between two huge files

m[^ ( [^|]+ ) \| ]x

Can you explain this one? Thanks.

Replies are listed 'Best First'.
Re^3: how to find differences between two huge files
by BrowserUk (Patriarch) on Jan 24, 2008 at 22:26 UTC

    Sure.

    m[ ^ # From the start of the record ( # capture [^|]+ # everything that is not a pipe char ) \| # upto but excluding the first pipe char ]x

    In essence, grab the first field of a pipe delimited record into $1.

    Assumes that that field doesn't contain an escaped or quoted pipe character. Which is the normal case.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      alternately:
      ($key, $rest_of_rec)=split(/\|/, $file_rec, 2);
      Replace $rest_of_rec with undef if you don't need that value.

        Yes. But in some tests I did a while ago (so it may no longer be true), it appeared to me that the second part is produced even if you request it to be discarded. Normally insignificant, but if the records are very large (the OP mentioned 150+ fields) then there seemed no reason to produce what you do not need.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.