Re: Compare 2 csv files using a key set of colums

use Text::xSV;
use Set::Object;

my $parser = Text::xSV->new;

my @keys = split ',', shift(@ARGV);

my @sets;
foreach my $filename (@ARGV) {
    $parser->open_file( $filename );
    push @sets, Set::Object->new;
    $parser->read_header;
    while ($parser->get_row) {
        my $key = join ',', $parser->extract( @keys );
        $sets[-1]->insert( $key );
    }
}

# At this point, you have all the keys in @sets. You can:
#   $union = $sets[0] + $sets[1];
#   $intersection = $sets[0] * $sets[1];
#   $difference = $sets[0] - $sets[1];
#   $symmetric_difference = $sets[0] % $sets[1];
[download]

My criteria for good software:

Does it work?
Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Comment on Re: Compare 2 csv files using a key set of colums Download Code

Replies are listed 'Best First'.
Re^2: Compare 2 csv files using a key set of colums by eric256 (Parson) on Dec 13, 2005 at 22:02 UTC
From the looks of it that only inserts the key, mine stores the values so that i can generate a new csv with the difference. This allows me to compare files with different numbers of colums and get the difference. For instance I have two reports, one with patient name and id, the other just with name. Now i don't loose that extra data after the comparison. Minor but in the cases I use it it helps a lot. ___________ Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/\|\\\_\n/;print;	[reply]
Re^3: Compare 2 csv files using a key set of colums by dragonchild (Archbishop) on Dec 14, 2005 at 01:51 UTC
Yes, that's true. However, adding a per-file hash to store the key-line mapping is so trivial that I shouldn't have to mention it. Or, if memory might become an issue, you can reparse each file to find the necessary lines, which is what I would do. I tend to write memory-efficient solutions when dealing with datafiles because I have dealt with 1G+ xSV datafiles. Just because this file is 20K doesn't mean the next file will be. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]