comment on

This sounds very like how to find differences between two huge files. Maybe you could read that thread and find what you need, or perhaps you should talk to your workmate/classmate and see how he solved it?

Update: hmm, on second thoughts it's not the same - it was hard to tell because of your rubbish formatting, sorry.

Build a hash from the first (smaller) file, then use it in a single pass through the second (larger) file to figure out where stuff goes. Consider:

use strict;
use warnings;

#Hello,
#
#Currently I'm facing problem with comparing two huges files on a part
+icular key
#column. One file consists of 10k records and other one 18million reco
+rds. Both
#files are | (pipe) delimited. I am comparing based on the first colum
+n in the
#two files and redirecting them to two separate files.
#If Key columns are same it has to pick the record from 10k records fi
+le and send
#it to one file.
#If the Key columns are not matching ie., the key column is present in
+ 18 million
#records file but not in 10k records file, it has to go into another f
+ile.
#
#Here I'm pasting the query what I have written, taking more time.

my $oldFile1 = <<DAT;
1|oldFile|another field
5|oldFile
DAT
my $newFile1 = <<DAT;
1|newFile1|z
2|newFile1|x
3|newFile1|y
4|newFile1|p
DAT
my $oldFile2;
my $changes1;


open OLDFILE1, '<', \$oldFile1;

# Build the reference hash from the 'small' file
my %oldKeys;

while (<OLDFILE1>) {
    chomp;
    my ($key, $tail) = split /\|/, $_, 2;

    if (exists $oldKeys{$key}) {
        warn "Key $key duplicated. Duplicate ignored!\n";
        next;
    }

    $oldKeys{$key} = $tail;
}

close OLDFILE1;

# Process the new file
open NEWFILE1, '<', \$newFile1;
open OLDFILE2, '>', \$oldFile2;
open CHANGES1, '>', \$changes1;
while (<NEWFILE1>) {
    chomp;
    my ($key, $tail) = split /\|/, $_, 2;

    if (exists $oldKeys{$key}) {
        print OLDFILE2 "$key|$oldKeys{$key}\n";
    } else {
        print CHANGES1 "$key|$tail\n";
    }
}

close (NEWFILE1);
close (OLDFILE2);
close (CHANGES1);

print "OLDFILE2:\n$oldFile2\n\n";
print "CHANGES1:\n$changes1\n\n";
[download]

prints:

OLDFILE2:
1|oldFile|another field


CHANGES1:
2|newFile1|x
3|newFile1|y
4|newFile1|p
[download]

Perl is environmentally friendly - it saves trees

In reply to Re: comparing two huges files by GrandFather
in thread comparing two huges files by vamsikrishna

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.