Compare csv file fields?

traveler has asked for the wisdom of the Perl Monks concerning the following question:

I sometimes help out a local school with its network. The other night we found out that username->uid mappings in /etc/passwd and smb.conf were out of sync. Bummer. I printed a list using awk, grep, sort and uniq (at least). Perl seemed like a better and more accurate approach.

I supersearched for csv and diff, but couldn't find anything. Here is what I have:

passwd has the name in field 1 and the uid in field 3
smbpasswd has the name in field 1 and the uid in field 2
both files separate fields with :
the files are unlikely to be sorted in any manner
I don't want to compare other fields
It might be nice to have some sort of generic compare for csv files.

I thought about reading the data into two hashes and doing a brute force compare, but that sounded ugly. Using the uid as an array subscript seems inappropriate as the array might be pretty sparse.

It seems as though someone else might have done this and I'd hate to reinvent this wheel.

--traveler

Comment on Compare csv file fields? Select or Download Code

Replies are listed 'Best First'.
Re: Compare csv file fields? by dragonchild (Archbishop) on Oct 07, 2003 at 16:56 UTC
Good god, man! Think a little bit. This is a standard ETL action. my %passwd; my %smbpasswd; while (<PASSWD>) { next if /^#/; chomp; my ($name, $uid) = (split ':', $_, 4)[0,2]; $passwd{$uid} = $name; } # Do the same for SMBPASSWD, except use [0,1] instead of [0,2] foreach my $uid (keys %passwd) { unless (exists $smbpasswd($uid}) { print "'$uid' in passwd, not in smbpasswd\n"; next; } # Deleted to do later comparison my $smb_name = delete $smbpasswd{$uid}; unless ($passwd{$uid} eq $smb_name) { print "$uid has $passwd{$uid} in passwd, but $smb_name in smbb +passwd\n"; } } while (my ($k, $v) = each %smbpasswd) { print "$k in smbpasswd, not in passwd\n"; } [download] ------ We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply] [d/l]
Re: Re: Compare csv file fields? by traveler (Parson) on Oct 07, 2003 at 17:21 UTC
Yes, dragonchild this is indeed the usual way. That's why I said it's what I had considered. What I am looking to see is whether that is the only reasonable (or practical) way or whether there is another, better, way using some technique I did not know or using some module(s) I had not found.	[reply]
Re3: Compare csv file fields? by dragonchild (Archbishop) on Oct 07, 2003 at 17:47 UTC
Parsing can be done using tilly's Text::xSV. Array::Compare might be useful in building the comparisons, as might Set::Scalar. Set::Scalar will allow you to generate two lists of UIDs and get the differences between the two. Then, you can do the same thing, but with the names from the acceptable UIDs. Personally, I think that all these modules are overkill for what is, essentially, a one-off. But, that's just me. (Do you really anticipate your passwd files getting out of sync again?? That sounds like a bigger issue, to me ...) ------ We are the carpenters and bricklayers of the Information Age.* The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply]
Re: Compare csv file fields? by Roger (Parson) on Oct 08, 2003 at 05:03 UTC
Yes there is another way of doing this, by using SQL select's with the DBD::CSV module. I have constructed a little example below. use strict; use DBI; use DBD::CSV; use IO::File; # Prepare the csv file #1 my $p; { my $f = new IO::File "passwd", "r"; local $/; $p = <$f>; } { my $t = new IO::File "passwd.txt", "w"; print $t uc("login:x:uid:gid:desc:home:shell\n"), $p; } # Prepare the csv file #2 my $p; { my $f = new IO::File "passwd2", "r"; local $/; $p = <$f>; } { my $t = new IO::File "passwd2.txt", "w"; print $t uc("login:x:uid:gid:desc:home:shell\n"), $p; } # Connect to CSV database my $dbh = DBI->connect("DBI:CSV:csv_sep_char=\\:") or die "Cannot conn +ect: " . $DBI::errstr; $dbh->{'csv_tables'}->{'passwd'} = { 'file' => 'passwd.txt' }; $dbh->{'csv_tables'}->{'passwd2'} = { 'file' => 'passwd2.txt' }; my $sth = $dbh->prepare("SELECT p1.login FROM passwd p1, passwd2 p2 WHERE (p1.login=p2.login) AND (p1.uid=p2.uid)" +); $sth->execute(); # Get the matching id's my $matched; while (my $res = $sth->fetchrow_hashref()) { $matched .= ",'" . $res->{LOGIN} . "'"; } $sth->finish; # Get the unmatched id's in passwd2 $sth = $dbh->prepare("SELECT login FROM passwd2 WHERE login not in (" .substr($matched, 1) . ")"); $sth->execute(); while (my $res = $sth->fetchrow_hashref()) { print "Unmatched accounts in passwd2: $res->{LOGIN}\n"; } $sth->finish; # Get the unmatched id's in passwd $sth = $dbh->prepare("SELECT login FROM passwd WHERE login not in (" .substr($matched, 1) . ")"); $sth->execute(); while (my $res = $sth->fetchrow_hashref()) { print "Unmatched accounts in passwd2: $res->{LOGIN}\n"; } $dbh->disconnect; unlink("passwd.txt", "passwd2.txt"); [download] It's not the most efficient method in this particular case, but the benefit will be more significant on more complex sort of comparisons.	[reply] [d/l]