in reply to compare data between two files using Perl

You should only have to read each file once. From the way your describe your algorithm, you want to find things in one file but are not in another. The basic idea is this:
my %seen; open(B, "brd_sym_pn.txt") or die "..."; while (<B>) { my ($RefDes, $Pnumm, $Pkg_Type) = ...parse these from the line... $seen{$RefDes, $Pnumm, $Pkg_Type} = 1; } close(B); open(S, "sym_text_latest.txt") or die "..."; while (<S>) { my ($RefDes, $Pnumm, $Pkg_Type) = ...parse these from the line... $seen{$Refdes, $Pnumm, $Pkg_Type} += 2; } close(S); while (my ($key, $val) = each %seen) { if ($val == 1) { # $key is in first file but not second } elsif ($val == 2) { # $key is in second file but not first } else { # key is in both files } }
For info on how to interpret $key, have a look at the documentation for the $; variable in perldoc perlvar. For special cases you can optimize this code.

Replies are listed 'Best First'.
Re^2: compare data between two files using Perl
by steveb94553 (Initiate) on Jun 16, 2008 at 21:35 UTC
    Hi pc88mxer,
    Actually my goal is to take some data from brd_sym_pn.txt file. Specifically the $RefDes, $Pnum and $Pkg_Type.
    Then check that the "$Pnum and $Pkg_Type" for each "$RefDes" matches the assigned "$LogPnum" and "$LogPkg_Type" from sym_text_latest.txt.
    If a match is found it reports back whether the reference designator is using the correct $Pnum and $Pkg_Type.
    I have purpously have made some $Pkg_Type incorrect to check my script.
    #This program will return the refdes, part number and package type #tab delimeted for each instance from brd_sym_pn.txt (extracted from #a layout database) and part number and package type from sym_tezt_lat +est.txt #generated from sym_text_mmddyy.xls when part number is entered at <ST +DIN> prompt #Compares the pkg_type used on board with sym_text log files and repor +ts #whether the pkg_type is correct or reports what the correct pkg_type +should be. $Pnum = "12-0259-01"; print $Pnum; open(partlog, "sym_text_latest.txt") || die("failed to open sym_ +text_latest.txt"); while($line = <partlog>) { @fields = split(/\t/,$line); if($fields[0] eq $Pnum) { our $LogPnPkg = "$fields[0]\t$fields[4]"; our $LogPnum = "$fields[0]"; our $LogPkg_Type = "$fields[4]"; } } open(brdpartlog, "brd_sym_pn.txt") || die("failed to open brd_s +ym_pn.txt"); print "\n"; open(brdpartlog, "brd_sym_pn.txt") || die("failed to open br +d_sym_pn.txt"); print "\n"; while($line = <brdpartlog>) { @fields = split(/\t/,$line); if($fields[1] eq $Pnum) { our $RdesPnPkg = "$fields[0]\t$fields[1]\t$fields[2] +"; our $BrdPnPkg = "$fields[1]\t$fields[2]"; our $RefDes = "$fields[0]"; if($BrdPnPkg eq $LogPnPkg) { print("$RefDes\t$BrdPnPkg is the correct Allegro footp +rint.\n"); } else { print("$RefDes\t$BrdPnPkg should be using $LogPkg_Type +\n"); } } }
    prints results:
    J2 12-0259-01 HDR-1X28-100-FLK-VT should be using HDR-1X2-100-FLK-VT
    J1 12-0259-01 HDR-1X2-100-FLK-VT is the correct Allegro footprint.
      I think I understand what you are trying to do, and the approach I gave is a good start for conducting the analysis you want to perform.

      Suppose that the first file contains the following triples:

      RefDes Package PType R1 P1 T1 R2 P2 T2 ...
      and the second file contains
      RefDes Package PType R1 P1 NOT-T1 R2 P2 T2 R2 P2 ANOTHER-T2 ...
      The above algorithm will report that R1,P1,T1 appears in the first file but the not second and that the triple R1,P1,NOT-T1 appears in the second file but not the first. The interpretation of this is that NOT-T1 in the second file is a mistake and should be T1. We can modify the code to actually produce this message, but I just wanted to demonstrate how this situation is picked up by the algorithm.

      To take another example, consider the triples in each file that begin with R2,P2. The above algorithm will report that R2,P2,T2 appears in both files and that R2,P2,ANOTHER-T2 is in the second file but not the first. You have to decide how to interpret this situation. Perhaps it means that the second file is malformed because it contains two triples that begin with R2,P2.

      Again, you should only need to read your files once.

Re^2: compare data between two files using Perl
by Anonymous Monk on Dec 16, 2008 at 12:14 UTC
    Hey, I wanted to know more about: while (my ($key, $val) = each %seen) { how to interpret this. I looked at perlvar, not much idea. thanks, Hashmat
      each() will help you with your question. To put it short: each() returns a list of the next key/value pair and uses memory very efficiently.
      Regards,
      svenXY
        Right. But how do I split $key to get my original keys ?
        Right. But I wanted to know from $value how do I get the original tuples ?
        Sorry, its rather I wanted to know from $key how do I get the original tuples ?