xspikx has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I need your help, because I'm totally lost. Have no clue where to start. I have 2 csv files (comma separated values). I have to compare the two files, at first by the first column. Anything that exists in the first file's first column, and not in the second file's first column, has to be noted. Anything that exists in the second file's first column and not in the first one's has to be noted (a command will be executed). Any idea of how to do this fast and easy? Any help is much appreciated.

Replies are listed 'Best First'.
Re: File comparison
by jZed (Prior) on Nov 01, 2005 at 15:40 UTC
Re: File comparison
by marto (Cardinal) on Nov 01, 2005 at 15:58 UTC
    Hi xspikx,

    You may want to look at the List::Compare module.
    Have a read a the documentation and let us know how you get on.
    Hope this helps.

    Martin
      Hi Martin, Thanks for your suggestion. It looks like this will work out for most part of my script. Now that I get all the different first fields, I can move on to get all the same first fields, and compare the entire line in both files.
Re: File comparison
by holli (Abbot) on Nov 01, 2005 at 16:03 UTC
    C:\>perl -anF/,/ -e "print qq($F[0]\n)" file1.txt>c:\file1a.txt C:\>type 1a.txt a b d c e C:\>perl -anF/,/ -e "print qq($F[0]\n)" file2.txt>file2a.txt C:\>type 2a.txt a b c d c e C:\>diff file1a.txt file2a.txt 4d3 <c


    holli, /regexed monk/
      What if one field somewhere has an internal line break?
        Then the data is corrupt ;)


        holli, /regexed monk/
        it won't have one. Before these two files are created, all data is verified, any linebreaks, spaces are removed.
Re: File comparison
by ambrus (Abbot) on Nov 01, 2005 at 17:58 UTC

    join(1) is your friend.

    Let's take for example

    [am]king ~/a/tm$ cat first.csv apple,5 pear,4 ananas,6 watermelon,10 salad,5 carrot,6 peach,5 apricot,7 [am]king ~/a/tm$ cat second.csv peach,orange watermelon,green ananas,yellow,expensive apple,red banana,yellow apple,red pea,brown apricot,orange pear,yellow spinach,green salad,green
    Then we have to sort them and use join to find the lines found only in the first or only in the second file:
    [am]king ~/a/tm$ sort first.csv > first.s [am]king ~/a/tm$ sort second.csv > second.s [am]king ~/a/tm$ join -v1 -t, first.s second.s carrot,6 [am]king ~/a/tm$ join -v2 -t, first.s second.s banana,yellow pea,brown spinach,green

    Update 2009 sep 2.

    See Re^2: Joining two files on common field for a list of other nodes where unix textutils is suggested to merge files.

      The only issue is that all the data manipulation, comparison and so on has to happen within the script (for automation purposes).