Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

compare two files and update

by mmittiga17 (Scribe)
on Nov 05, 2010 at 19:17 UTC ( [id://869732]=perlquestion: print w/replies, xml ) Need Help??

mmittiga17 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I have two files. File1 Master.txt and file2 update.txt. I want to look at both files and find a matching key ex: $key = substr($line,3,12); If a key from the update.txt file is found in the master, replace the entire line in the master. If the key from update is not found in the Master.txt, add the line from the update.txt to the master.txt. it sounds really simple but I am strugling with it. Help or thought would be greatly appreciated.

sub updateMSTR { open(SEC,"pw".$prev_date.".sec")||warn("sec file not found")&& exi +t; open (MSTR,"c:\\lab\\PWB\\Master\\Security_Master.sec"); open( NewMSTR, ">MASTER.TMP"); while(defined($line1 = (<SEC>))){ chomp($line1); $key1 = substr($line1,3,12); $key1 =~s/\s+//g; while(defined($line2 = (<MSTR>))){ chomp($line2); $key2 = substr($line2,3,12); $key2 =~s/\s+//g; if ("$key1" eq "$key2"){ print"$key $key2\n"; print NewMSTR "1 $line1\n"; }else{ print NewMSTR "2 $line2\n"; } } } close(MSTR); close(NewMSTR); }

Replies are listed 'Best First'.
Re: compare two files and update
by kcott (Archbishop) on Nov 05, 2010 at 22:53 UTC

    You have a logic problem and a design problem.

    Logic problem: read 1st line of update file; read first to last lines of master file; read 2nd line of update file; oops - nothing left to read from master file.

    Design problem: processing the entire master file for every line of the input file - MishaMoose alluded to this (above).

    Try creating a hash (say %index) then read through the update file only and for each line store the data in the hash ($index{$key} = $line;). Next, read through the master file and for each line process its key against $index{$key}. That way you only read the files once.

    Be aware, this assumes that the keys are unique!

    -- Ken

Re: compare two files and update
by MishaMoose (Scribe) on Nov 05, 2010 at 19:48 UTC

    Greetings!

    How large are your files? Unless they are both very large it mght pay to read one (the update file) or both into hashes and/or arrays the walk through the update file and match the keys to the master file. As it is it appears you wil only read the first lien in the update file and the read the whole master file and then read the rest fo the keys. with the nested llops you wil lhave to read teh inner file on average many times (on average a number of complete passes thriugh the inner file equal to about 25% of the number of lines in the outer).

    I am about to leave work but will send yo a little code when I get home.

    I hope this will give you some ideas until I can send some code.

    As always I am certain that if I mis-spoke a wandering monk wil correct my error. 8^)

    Misha/Michael - Russian student, grognard, bemused observer of humanity and self professed programmer with delusions of relevance
Re: compare two files and update
by umasuresh (Hermit) on Nov 05, 2010 at 19:58 UTC
    Try this:
    use strict; use warnings; use Data::Dumper; # Read files into an array and populate a hash my($key1, $key2, %hash1, %hash2); open (IN1,'<'.$ARGV[0]) || die "***can't open the file $!\n"; my $col1 = $ARGV[2]; my $col2 = $ARGV[3]; my @lines1 = <IN1>; close IN1; #$i=0; # Master for (@lines1) { chomp; my @a1 = split(/\t/, $_); my $key1 = $a1[$col1]; $hash1{$key1} =$_ } # check if key in file2 exists in file1 and if so merge the files open (IN2,'<'.$ARGV[1]) || die "***can't open the file $!\n"; my @lines2 = <IN2>; close IN2; #open (OUT,'>'.$ARGV[2]) || die "***can't open the file $!\n"; # slave for (@lines2) { chomp; my @a2 = split(/\t/, $_); my ($key2) = $a2[$col2]; $hash2{$key2} = $_; $hash1{$key2} = $hash2{$key2}; } for my $key (sort keys %hash1) { print "$hash1{$key}\n"; } exit;
Re: compare two files and update
by ig (Vicar) on Nov 06, 2010 at 18:08 UTC

    You might try something like this:

    #!/usr/bin/perl use strict; use warnings; my $master = 'master.txt'; my $newmaster = 'master.tmp'; my $update = 'update.txt'; open my $updatefh, '<', $update or die "$update: $!"; my %updates; my @updates; foreach my $line (<$updatefh>) { my $key = substr($line,3,12); $updates{$key} = $line; push(@updates, $key); } close($updatefh); open my $masterfh, '<', $master or die "$master: $!"; open my $newmasterfh, '>', $newmaster or die "$newmaster: $!"; foreach my $line (<$masterfh>) { my $key = substr($line,3,12); if(exists($updates{$key})) { print $newmasterfh $updates{$key}; delete($updates{$key}); } else { print $newmasterfh $line; } } close($masterfh); foreach my $key (@updates) { if(exists($updates{$key})) { print $newmasterfh $updates{$key}; } } close($newmasterfh);

    This assumes that each key appears only once in master.txt. Some change would be required if you want a single line in the update file to update multiple lines in the master file.

    This also assumes the update file is small enough that the hash of updates fits in memory but will work even if the master file is larger than available memory.

    The order of lines in the master file is maintained. Any new lines added from the updates file are appended in the same order as they appeared in the updates file.

      Awesome! thanks that worked and helped me complete the full script. Thank you for your help!
Re: compare two files and update
by MishaMoose (Scribe) on Nov 06, 2010 at 15:31 UTC

    My apologies for not getting back last night, RealLife unfortunately intervened. It would help to have more information on the file you are updating. Things like is the file in order, is necessary to maintain the original order etc. The requirements determine what approaches are appropriate. It has been my experience that time spent up front considering the requirements and the data will save a lot of grief down the road.

    Without more information the code from umasuresh should work and give you a good starting point

    Good Luck!

    Misha/Michael - Russian student, grognard, bemused observer of humanity and self professed programmer with delusions of relevance

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://869732]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-23 22:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found