Re^2: Help on file comparison

Hi,

Unique id contains 32 digit of alpha numeric value. And the card number is 15 digits of numeric value.

DUMP_ID contains the value like,

aer7893jhufn3ko9ij8omnu89koi8hyt

aer7893jhufn7ko9ij8omnu89koi8hnh

DUMP_ACCT_NO contains the record like,

aer7893jhufn7ko9ij8omnu89koi8hnh|675634902349287

aer7893jhufn7ko9ij8omnu89koi8hnh|324634902349287

Here, DUMP_ID contains more records than DUMP_ACCT_NO. We are getting the out of memory while pushing into the hash map.

Comment on Re^2: Help on file comparison

Replies are listed 'Best First'.
Re^3: Help on file comparison (Just sort!) by BrowserUk (Patriarch) on May 28, 2011 at 19:12 UTC
There are various things you could do to reduce the memory requirements. For example, you could build up a string of account numbers rather than an array. This would save quite a lot of space: `C:\test>p1 @a = map int rand( 1e16 ), 1 .. 10;; print total_size \@a;; 496 $s = join ' ', @a;; print total_size $s;; 216` [download] Multiply that saving by 37 million and you might avoid the problem. Take it a step further and pack the account numbers and you can save even more: `$a = pack 'Q', @a;; print total_size $a;; 136` [download] But having looked back at your OP, what you are doing make no sense at all. It makes no sense to even read the second file as you only output records if they are already in the hash from processing the first file. In other words, having built the hash from the first file, all you need to do is dump its contents and ignore the second file completely But as your final output file is identical to the first of your input files, except that all the records with the same unique id are grouped together, the simplest, fastest way to achieve that is to just sort that file. Originally you call your files "DUMP_ID and DUMP_CARD_NO"* and then later you talk about "DUMP_ACCT_NO". That combined with the inconsistencies in your posted code make me think that this question is a plant. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Help on file comparison (Just sort!)
by BrowserUk (Patriarch) on May 28, 2011 at 19:12 UTC

There are various things you could do to reduce the memory requirements. For example, you could build up a string of account numbers rather than an array. This would save quite a lot of space:

C:\test>p1
@a = map int rand( 1e16 ), 1 .. 10;;
print total_size \@a;;
496

$s = join ' ', @a;;
print total_size $s;;
216
[download]

Multiply that saving by 37 million and you might avoid the problem. Take it a step further and pack the account numbers and you can save even more:

$a = pack 'Q*', @a;;
 print total_size $a;;
136
[download]

But having looked back at your OP, what you are doing make no sense at all.

It makes no sense to even read the second file as you only output records if they are already in the hash from processing the first file. In other words, having built the hash from the first file, all you need to do is dump its contents and ignore the second file completely

But as your final output file is identical to the first of your input files, except that all the records with the same unique id are grouped together, the simplest, fastest way to achieve that is to just sort that file.

Originally you call your files "DUMP_ID and DUMP_CARD_NO" and then later you talk about "DUMP_ACCT_NO". That combined with the inconsistencies in your posted code make me think that this question is a plant.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]
[select]