Re: compare data between two files using Perl

You should only have to read each file once. From the way your describe your algorithm, you want to find things in one file but are not in another. The basic idea is this:

my %seen;
open(B, "brd_sym_pn.txt") or die "...";
while (<B>) {
  my ($RefDes, $Pnumm, $Pkg_Type) = ...parse these from the line...
  $seen{$RefDes, $Pnumm, $Pkg_Type} = 1;
}
close(B);

open(S, "sym_text_latest.txt") or die "...";
while (<S>) {
  my ($RefDes, $Pnumm, $Pkg_Type) = ...parse these from the line...
  $seen{$Refdes, $Pnumm, $Pkg_Type} += 2;
}
close(S);

while (my ($key, $val) = each %seen) {
  if ($val == 1) {
    # $key is in first file but not second
  } elsif ($val == 2) {
    # $key is in second file but not first
  } else {
    # key is in both files
  }
}
[download]

For info on how to interpret $key, have a look at the documentation for the $; variable in perldoc perlvar. For special cases you can optimize this code.

Comment on Re: compare data between two files using Perl Select or Download Code

Replies are listed 'Best First'.
Re^2: compare data between two files using Perl by steveb94553 (Initiate) on Jun 16, 2008 at 21:35 UTC
Hi pc88mxer, Actually my goal is to take some data from brd_sym_pn.txt file. Specifically the $RefDes, $Pnum and $Pkg_Type. Then check that the "$Pnum and $Pkg_Type" for each "$RefDes" matches the assigned "$LogPnum" and "$LogPkg_Type" from sym_text_latest.txt. If a match is found it reports back whether the reference designator is using the correct $Pnum and $Pkg_Type. I have purpously have made some $Pkg_Type incorrect to check my script. #This program will return the refdes, part number and package type #tab delimeted for each instance from brd_sym_pn.txt (extracted from #a layout database) and part number and package type from sym_tezt_lat +est.txt #generated from sym_text_mmddyy.xls when part number is entered at <ST +DIN> prompt #Compares the pkg_type used on board with sym_text log files and repor +ts #whether the pkg_type is correct or reports what the correct pkg_type +should be. $Pnum = "12-0259-01"; print $Pnum; open(partlog, "sym_text_latest.txt") \|\| die("failed to open sym_ +text_latest.txt"); while($line = <partlog>) { @fields = split(/\t/,$line); if($fields[0] eq $Pnum) { our $LogPnPkg = "$fields[0]\t$fields[4]"; our $LogPnum = "$fields[0]"; our $LogPkg_Type = "$fields[4]"; } } open(brdpartlog, "brd_sym_pn.txt") \|\| die("failed to open brd_s +ym_pn.txt"); print "\n"; open(brdpartlog, "brd_sym_pn.txt") \|\| die("failed to open br +d_sym_pn.txt"); print "\n"; while($line = <brdpartlog>) { @fields = split(/\t/,$line); if($fields[1] eq $Pnum) { our $RdesPnPkg = "$fields[0]\t$fields[1]\t$fields[2] +"; our $BrdPnPkg = "$fields[1]\t$fields[2]"; our $RefDes = "$fields[0]"; if($BrdPnPkg eq $LogPnPkg) { print("$RefDes\t$BrdPnPkg is the correct Allegro footp +rint.\n"); } else { print("$RefDes\t$BrdPnPkg should be using $LogPkg_Type +\n"); } } } [download] prints results: J2 12-0259-01 HDR-1X28-100-FLK-VT should be using HDR-1X2-100-FLK-VT J1 12-0259-01 HDR-1X2-100-FLK-VT is the correct Allegro footprint.	[reply] [d/l]
Re^3: compare data between two files using Perl by pc88mxer (Vicar) on Jun 18, 2008 at 21:19 UTC
I think I understand what you are trying to do, and the approach I gave is a good start for conducting the analysis you want to perform. Suppose that the first file contains the following triples: `RefDes Package PType R1 P1 T1 R2 P2 T2 ...` [download] and the second file contains `RefDes Package PType R1 P1 NOT-T1 R2 P2 T2 R2 P2 ANOTHER-T2 ...` [download] The above algorithm will report that `R1,P1,T1` appears in the first file but the not second and that the triple `R1,P1,NOT-T1` appears in the second file but not the first. The interpretation of this is that `NOT-T1` in the second file is a mistake and should be `T1`. We can modify the code to actually produce this message, but I just wanted to demonstrate how this situation is picked up by the algorithm. To take another example, consider the triples in each file that begin with `R2,P2`. The above algorithm will report that `R2,P2,T2` appears in both files and that `R2,P2,ANOTHER-T2` is in the second file but not the first. You have to decide how to interpret this situation. Perhaps it means that the second file is malformed because it contains two triples that begin with `R2,P2`. Again, you should only need to read your files once.	[reply] [d/l] [select]
Re^2: compare data between two files using Perl by Anonymous Monk on Dec 16, 2008 at 12:14 UTC
Hey, I wanted to know more about: while (my ($key, $val) = each %seen) { how to interpret this. I looked at perlvar, not much idea. thanks, Hashmat	[reply]
Re^3: compare data between two files using Perl by svenXY (Deacon) on Dec 16, 2008 at 13:04 UTC
each() will help you with your question. To put it short: each() returns a list of the next key/value pair and uses memory very efficiently. Regards, svenXY	[reply]
Re^4: compare data between two files using Perl by Anonymous Monk on Dec 16, 2008 at 14:18 UTC
Right. But how do I split $key to get my original keys ?	[reply]
Re^4: compare data between two files using Perl by Anonymous Monk on Dec 16, 2008 at 14:20 UTC
Right. But I wanted to know from $value how do I get the original tuples ?	[reply]
Re^4: compare data between two files using Perl by Anonymous Monk on Dec 16, 2008 at 15:08 UTC
Sorry, its rather I wanted to know from $key how do I get the original tuples ?	[reply]