in reply to Re^2: how to find differences between two huge files
in thread how to find differences between two huge files

Sure.

m[ ^ # From the start of the record ( # capture [^|]+ # everything that is not a pipe char ) \| # upto but excluding the first pipe char ]x

In essence, grab the first field of a pipe delimited record into $1.

Assumes that that field doesn't contain an escaped or quoted pipe character. Which is the normal case.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^4: how to find differences between two huge files
by chrism01 (Friar) on Jan 25, 2008 at 01:37 UTC
    alternately:
    ($key, $rest_of_rec)=split(/\|/, $file_rec, 2);
    Replace $rest_of_rec with undef if you don't need that value.

      Yes. But in some tests I did a while ago (so it may no longer be true), it appeared to me that the second part is produced even if you request it to be discarded. Normally insignificant, but if the records are very large (the OP mentioned 150+ fields) then there seemed no reason to produce what you do not need.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.