Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks, We have two csv files, there is a common column, column 3. We wish to know the best way, without modules if possible (installing them here is a hastle), how to find out which entries from the first file are not in the second file. The match is on column 3, which is a numeric value. Thanks

Replies are listed 'Best First'.
Re: Compare CSV files
by Moron (Curate) on Jul 19, 2006 at 15:02 UTC
    if you are a) on *nix and b) just want the values in column 3 that are unique to file 1, it might be easier from the shell:
    cut -f3 -d, <file1.csv | sort -u >file1.cut.sort cut -f3 -d, <file2.csv | sort -u >file2.cut.sort comm -23 file1.cut.sort file2.cut.sort
    Otherwise, if your requirements are more difficult, here is an example in Perl for the case of needing whole lines of file 1 in original order:
    use strict; use warnings; my $file2 = {}; open my $fh2, "<file2.csv"; while( <$fh2> ) { chomp; my @fld = split /\,/; $file2 -> { $fld[2] } = 1; } close $fh2; open my $fh1 "<file1.csv"; while( <$fh1> _ ) { chomp; my @fld = split /\,/; defined( $file2 -> { $fld[2] } ) and next; print "$_\n"; } close $fh1;

    -M

    Free your mind

      Handy code, but it does suffer the problem that the commas might be quoted or that there are newlines in quotes (a common problem with hand-rolled CSV parsers). If the OPs CSV files are simple enough, then your code will be useful. Otherwise, he or she may be forced to find a way to sneak in modules (which is often much easier than it sounds).

      In any event, take my comments as caveats, not criticisms :) Your code has a good chance of working for simple CSV files.

      Cheers,
      Ovid

      New address of my CGI Course.

        he or she may be forced to find a way to sneak in modules

        Or, alternatively, try the standard Text::ParseWords module. I've never understood why that module isn't used more widely.

        --
        <http://dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

        Very true. I think I once wrote a simple Split routine that walks through strings willy-nilly where I was in a "no module" situation like the OP combined with applicability of the caveat you describe.

        -M

        Free your mind