compare records in two csv files

boddeti has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am a beginner to perl, trying to do automate a test case which is comparison of all the records in one csv file to the recors in the second csv file. wht exactly i am trying to do is : take one record from first csv file and compare with all records in the other csv file if found a match, say its matched other mismatch.

I have written with my initial knowledge, couldnt find where iam going wrong.

for my $arg (@ARGV) {
    #$script=$ARGV[0];
    #$stanza=$ARGV[1];
    $metricsfile=$ARGV[0];
    $resultfile=$ARGV[1];
    }
    
 print "MetricsFilename:$metricsfile\n";
    
### Assigning the argument file to in_file ####
$in_file = $metricsfile;

### Opening the input file to read ####
open (IN, "<$in_file") or die "Can't open $in_file: $!";

### A while condition to read all the lines in a file and place in an 
+array####
#my $linenum=1;

while(@linesFromMetrics=<IN>){ # opening while loop
 #print "Reading metrics file \n";
 #print $linenum++;

  # closing while loop
 
 foreach $Metricsline(@linesFromMetrics){
 
 
 
 
 ####------------------------------------------#####
  ### READING THE RESULTS CSV FILE            #####
  ####----------------------------------------####
        
 
  print "ResultFilename:$resultfile\n";
    
 ### Assigning the argument file to in_file ####
 $input_file = $resultfile;
 
 ### Opening the input file to read ####
 open (INPUT, "<$input_file") or die "Can't open $input_file: $!";
 
 ### A while condition to read all the lines in a file and place in an
+ array####

 while(@linesFromresultfile=<INPUT>){ # opening while loop
 my $var=false;
  
  foreach $Resultline(@linesFromresultfile){
     
      if($Metricsline eq $Resultline){   
  
       print "The lines from metrics file :$Metricsline \n";
       print "The lines from results file :$Resultline \n";
  
       print "The line matches \n";
       
       $var=true;
       
   
      }
      
  
  
      } # result file for each loop
      
      if($var=false)
      {
       print "mismatch";
      }
 
   } # while results file
   
   close INPUT;
   
  } # metrics file for each loop 
 
 } # while metrics file

 close IN;
[download]

can any guide me, where i am going wrong or is there any other way of doing

Comment on compare records in two csv files Download Code

Replies are listed 'Best First'.
Re: compare records in two csv files by blazar (Canon) on Jan 19, 2006 at 13:37 UTC
Long list of remarks follows... `for my $arg (@ARGV) { #$script=$ARGV[0]; #$stanza=$ARGV[1]; $metricsfile=$ARGV[0]; $resultfile=$ARGV[1]; }` [download] The `for` loop does nothing, remove it! `use strict; # always! use warnings; # always!` [download] `### Assigning the argument file to in_file #### $in_file = $metricsfile;` [download] No need for cmts that do not improve readability, and even degrade it. Well written code should be self explanatory. Why assigning to yet another scalar and not using `$metricsfile` directly? `open (IN, "<$in_file") or die "Can't open $in_file: $!";` [download] For the n-th time, n being a huge integer: you'd better use the 3-args form of open and lexical filehandles. `### A while condition to read all the lines in a file and place in an +array#### #my $linenum=1; while(@linesFromMetrics=<IN>){ # opening while loop` [download] NO! you either want `my @lines=<IN>; # or while (my $line=<IN>) { # ...` [download] `#print "Reading metrics file \n"; #print $linenum++;` [download] You may be interested in `$.` - read more about in perldoc perlvar. `### Assigning the argument file to in_file #### $input_file = $resultfile;` [download] Your comment is not consistent with your code in two ways: missing sigil and wrong variable name. One more reason not to have useless comments: if you change, say, a variable name, then you'll have to change the comment as well. Well, that's kinda too much. I'll suggest you another way round - this assumes your files are reasonably sized: `#!/usr/bin/perl use strict; use warnings; die "Usage: $0 <file1> <file2>\n" unless @ARGV==2; my %orig=map {$_ => 1} do { open my $fh, '<', $ARGV[0] or die $!; <$fh>; }; open my $fh, '<', $ARGV[1] or die $!; $orig{$_} and print while <$fh>; __END__` [download] This is intendedly minimal: expand and adapt to your needs!	[reply] [d/l] [select]
Re: compare records in two csv files by svenXY (Deacon) on Jan 19, 2006 at 13:44 UTC
Hi, make the resultfile inot a lookup hash, then go through the metrics file and check if the key exists: #!/usr/bin/perl use strict; use warnings; my $metricsfile = $ARGV[0]; my $resultfile = $ARGV[1]; print "MetricsFilename:$metricsfile\n"; print "ResultsFilename:$resultfile\n\n"; my %results; ### Opening the result file to read #### open( IN, "<$resultfile" ) or die "Can't open $resultfile: $!"; while ( <IN> ) { chomp; # remove newline $result{$_}++; # put line into hash as key } close IN; ### Opening the metrics file to read #### open( IN, "<$metricsfile" ) or die "Can't open $metricsfile: $!"; while ( <IN> ) { chomp; # remove newline if ($result{$_}) { print "$_ is in $resultfile\n"; } else { print "$_ is NOT in $resultfile\n"; } } close IN; [download] Regards, svenXY	[reply] [d/l]
Re: compare records in two csv files by Limbic~Region (Chancellor) on Jan 19, 2006 at 14:03 UTC
boddeti, With having only read the title and the first line of your question I would suggest taking a look at CSV table diff utility for ideas. If it is not applicable to your problem I apologize. Cheers - L~R	[reply]
Re: compare records in two csv files by radiantmatrix (Parson) on Jan 19, 2006 at 16:04 UTC
First off, your "records" can match without the physical lines matching. For example, the following lines are a record match: `field 1,field 2 "field 1","field 2"` [download] Given that, what you have to do is adequately parse the CSV files into some kind of records format that's easy to compare. You could do this in memory with data structures, but that will cause problems if your CSV files grow to sufficient size. Better to use a lightwieght database. I recommend Text::CSV_XS for parsing the CSV, and DBI with the DBD::SQLite2 driver for a lightweight database system. Some concept (e.g. untested, but conceptually sound) code: use strict; use warnings; use DBI; use Text::CSV_XS; use IO::File; my ($file1,$file2) = @ARGV[0..1]; # two first args are files to compar +e # create a session filename my $dbfile = \"Session"; { my $sid = "$dbfile"; #stringify ref $sid = $1 if m/^SCALAR$(.?)$/; #remove all but address $dbfile = "$sid.db"; } #prepare to read files my $csv = Text::CSV_XS->new(); my $io1 = IO::File->new($file1,'<') or die ("Can't open $file1 for rea +d: $!"); my $io2 = IO::File->new($file2,'<') or die ("Can't open $file2 for rea +d: $!"); my $out = IO::File->new("$dbfile.csv",'>') or die ("Can't write outfil +e: $!"); #init DB connection my $dbh = DBI->connect("dbi:SQLite2:dbname=$dbfile",'','',{RaiseError +=>1}); # Now, read file1 into the DB. print STDERR "loading $file1 into database...\n"; #-> first line of CSV is header row my $head1 = $csv->getline($io1); # create table $dbh->do( 'CREATE TABLE file ('.join(' TEXT,',@$head1).' TEXT)' ); # load db { my $sth = $dbh->prepare( 'INSERT INTO file ('.join(',',@$head1). ') VALUES ('.join(',',map {'?'} @$head1).')' ); until ($io1->eof) { my $row = $csv->getline($io1); $sth->execute(@$row); } } # ok, now compare file2 with what's in the DB. print STDERR "Beginning comparison.\n"; # assuming same header row $csv->getline($io2); { my $sth = $dbh->prepare( 'SELECT '.join(',',@$head1).' FROM file WHERE '. join(' AND ',map { "$_ = ?" } @$head1). ' LIMIT 1' ); until ($io2->eof) { my $row = $csv->getline($io2); $sth->execute(@$row); my $result = $sth->fetchrow_arrayref; if (defined $result && @$result) { print STDERR '+'; $csv->print($out,$row); #save matches to CSV } else { print STDERR '-'; } } print STDERR "\ndone.\n"; } print STDERR "Results saved to $dbfile.csv"; # clean up. $dbh->disconnect(); unlink($dbfile); [download] <-radiant.matrix-> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code* I haven't found a problem yet that can't be solved by a well-placed trebuchet	[reply] [d/l] [select]