First off, your "records" can match without the physical lines matching. For example, the following lines are a record match:

field 1,field 2 "field 1","field 2"

Given that, what you have to do is adequately parse the CSV files into some kind of records format that's easy to compare. You could do this in memory with data structures, but that will cause problems if your CSV files grow to sufficient size. Better to use a lightwieght database.

I recommend Text::CSV_XS for parsing the CSV, and DBI with the DBD::SQLite2 driver for a lightweight database system.

Some concept (e.g. untested, but conceptually sound) code:

use strict; use warnings; use DBI; use Text::CSV_XS; use IO::File; my ($file1,$file2) = @ARGV[0..1]; # two first args are files to compar +e # create a session filename my $dbfile = \"Session"; { my $sid = "$dbfile"; #stringify ref $sid = $1 if m/^SCALAR\((.*?)\)/; #remove all but address $dbfile = "$sid.db"; } #prepare to read files my $csv = Text::CSV_XS->new(); my $io1 = IO::File->new($file1,'<') or die ("Can't open $file1 for rea +d: $!"); my $io2 = IO::File->new($file2,'<') or die ("Can't open $file2 for rea +d: $!"); my $out = IO::File->new("$dbfile.csv",'>') or die ("Can't write outfil +e: $!"); #init DB connection my $dbh = DBI->connect("dbi:SQLite2:dbname=$dbfile",'','',{RaiseError +=>1}); # Now, read file1 into the DB. print STDERR "loading $file1 into database...\n"; #-> first line of CSV is header row my $head1 = $csv->getline($io1); # create table $dbh->do( 'CREATE TABLE file ('.join(' TEXT,',@$head1).' TEXT)' ); # load db { my $sth = $dbh->prepare( 'INSERT INTO file ('.join(',',@$head1). ') VALUES ('.join(',',map {'?'} @$head1).')' ); until ($io1->eof) { my $row = $csv->getline($io1); $sth->execute(@$row); } } # ok, now compare file2 with what's in the DB. print STDERR "Beginning comparison.\n"; # assuming same header row $csv->getline($io2); { my $sth = $dbh->prepare( 'SELECT '.join(',',@$head1).' FROM file WHERE '. join(' AND ',map { "$_ = ?" } @$head1). ' LIMIT 1' ); until ($io2->eof) { my $row = $csv->getline($io2); $sth->execute(@$row); my $result = $sth->fetchrow_arrayref; if (defined $result && @$result) { print STDERR '+'; $csv->print($out,$row); #save matches to CSV } else { print STDERR '-'; } } print STDERR "\ndone.\n"; } print STDERR "Results saved to $dbfile.csv"; # clean up. $dbh->disconnect(); unlink($dbfile);
<-radiant.matrix->
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

In reply to Re: compare records in two csv files by radiantmatrix
in thread compare records in two csv files by boddeti

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.