File comparison

xspikx has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: File comparison by jZed (Prior) on Nov 01, 2005 at 15:40 UTC
If you know SQL, just use DBD::CSV and use LEFT and RIGHT joins. You could modify the example in CSV table diff utility.	[reply]
Re: File comparison by marto (Cardinal) on Nov 01, 2005 at 15:58 UTC
Hi xspikx, You may want to look at the List::Compare module. Have a read a the documentation and let us know how you get on. Hope this helps. Martin	[reply]
Re^2: File comparison by xspikx (Acolyte) on Nov 01, 2005 at 16:43 UTC
Hi Martin, Thanks for your suggestion. It looks like this will work out for most part of my script. Now that I get all the different first fields, I can move on to get all the same first fields, and compare the entire line in both files.	[reply]
Re: File comparison by holli (Abbot) on Nov 01, 2005 at 16:03 UTC
`C:\>perl -anF/,/ -e "print qq($F[0]\n)" file1.txt>c:\file1a.txt C:\>type 1a.txt a b d c e C:\>perl -anF/,/ -e "print qq($F[0]\n)" file2.txt>file2a.txt C:\>type 2a.txt a b c d c e C:\>diff file1a.txt file2a.txt 4d3 <c` [download] holli, /regexed monk/	[reply] [d/l]
Re^2: File comparison by tilly (Archbishop) on Nov 01, 2005 at 16:32 UTC
What if one field somewhere has an internal line break?	[reply]
Re^3: File comparison by holli (Abbot) on Nov 01, 2005 at 16:40 UTC
Then the data is corrupt ;) holli, /regexed monk/	[reply] [d/l]
Re^4: File comparison by tilly (Archbishop) on Nov 01, 2005 at 16:44 UTC
Re^3: File comparison by xspikx (Acolyte) on Nov 01, 2005 at 16:41 UTC
it won't have one. Before these two files are created, all data is verified, any linebreaks, spaces are removed.	[reply]
Re: File comparison by ambrus (Abbot) on Nov 01, 2005 at 17:58 UTC
join(1) is your friend. Let's take for example `[am]king ~/a/tm$ cat first.csv apple,5 pear,4 ananas,6 watermelon,10 salad,5 carrot,6 peach,5 apricot,7 [am]king ~/a/tm$ cat second.csv peach,orange watermelon,green ananas,yellow,expensive apple,red banana,yellow apple,red pea,brown apricot,orange pear,yellow spinach,green salad,green` [download] Then we have to sort them and use join to find the lines found only in the first or only in the second file: `[am]king ~/a/tm$ sort first.csv > first.s [am]king ~/a/tm$ sort second.csv > second.s [am]king ~/a/tm$ join -v1 -t, first.s second.s carrot,6 [am]king ~/a/tm$ join -v2 -t, first.s second.s banana,yellow pea,brown spinach,green` [download] Update 2009 sep 2. See Re^2: Joining two files on common field for a list of other nodes where unix textutils is suggested to merge files.	[reply] [d/l] [select]
Re^2: File comparison by Anonymous Monk on Nov 01, 2005 at 18:35 UTC
The only issue is that all the data manipulation, comparison and so on has to happen within the script (for automation purposes).	[reply]