in reply to Compare 2 csv files using a key set of colums

use Text::xSV; use Set::Object; my $parser = Text::xSV->new; my @keys = split ',', shift(@ARGV); my @sets; foreach my $filename (@ARGV) { $parser->open_file( $filename ); push @sets, Set::Object->new; $parser->read_header; while ($parser->get_row) { my $key = join ',', $parser->extract( @keys ); $sets[-1]->insert( $key ); } } # At this point, you have all the keys in @sets. You can: # $union = $sets[0] + $sets[1]; # $intersection = $sets[0] * $sets[1]; # $difference = $sets[0] - $sets[1]; # $symmetric_difference = $sets[0] % $sets[1];

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Replies are listed 'Best First'.
Re^2: Compare 2 csv files using a key set of colums
by eric256 (Parson) on Dec 13, 2005 at 22:02 UTC

    From the looks of it that only inserts the key, mine stores the values so that i can generate a new csv with the difference. This allows me to compare files with different numbers of colums and get the difference. For instance I have two reports, one with patient name and id, the other just with name. Now i don't loose that extra data after the comparison. Minor but in the cases I use it it helps a lot.


    ___________
    Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;
      Yes, that's true. However, adding a per-file hash to store the key-line mapping is so trivial that I shouldn't have to mention it. Or, if memory might become an issue, you can reparse each file to find the necessary lines, which is what I would do.

      I tend to write memory-efficient solutions when dealing with datafiles because I have dealt with 1G+ xSV datafiles. Just because this file is 20K doesn't mean the next file will be.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?