Reconcile one list against another

spartan has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, it has been quite a while since I posted a question, but I've been lurking always reading the wisdom contained herein.

My question is this: I have 2 text files. Each one looks like something one would see when they cat /etc/passwd. The first file (I'll call it file1) has a list of users that need accounts. The second file has a list (in /etc/passwd format) of current accounts.

I easily created a list of users that *need* accounts by simple shell looping and grep. I was also able to produce a list of duplicate users easily enough via the shell again.

Alas, the last thing i wanted to do was to show who was left. These would be the accounts that have to be deleted. When simple shell one liners were of no avail, I turned to my swiss army knife of text processing, but I must admit that I am thwarted by this seemingly simple task.

Could I have just edited the files side by side? I could have, but I thought I was cheating myself out of an opportunity to learn something. And I here I am. I have code that I will paste below with copious comments as to what I think I should do, but what ultimately escapes me.

In my defense I will say one thing. I am not a programmer by trade, nor have I taken any programming courses outside of basic programming, and Pascal in high school.

#!/usr/bin/perl

use strict;

# Grep a user line OUT of a file, given a list of names, from another 
+file

# psuedo code:
# if file1 contains a name from file2, skip the line
# if file1 does NOT contain a name from file2, print the line.
#
# I am using this to determine a list of
# accounts that should be present, or not, on a server

my @users;
my $file1=$ARGV[0];
my $file2=$ARGV[1];

open FILE1, "<$file1" or die "Cannot open $file1: $!\n";
my @file1=<FILE1>;
close FILE1;

open FILE2, "<$file2" or die "Cannot open $file2: $!\n";
my @file2=<FILE2>;
close FILE2;
# Create @users array
my @out;
foreach my $line (@file2) {
  if ($line =~ /#/) {
    @out=split /\s+/,$line;
    push @users,$out[1];
  }
}

LINE: foreach my $line (@file1) {
  USER: foreach my $user (@users) {
    print "is $user on $line";
    if ($line =~ /$user/i) {
      print "YES $user is on $line ; next LINE\n";
      next LINE;
    } else {
      # This is where I get into trouble. I'm thinking I need to set
      # some kind of flag or something, because otherwise I will exhau
+st
      # the total list of users and will either get false positives
      # or skip them all together...
      print "nope... next USER\n";
      next USER;
    }
  }
}
[download]

Very funny Scotty... Now PLEASE beam down my PANTS!

Comment on Reconcile one list against another Download Code

Replies are listed 'Best First'.
Re: Reconcile one list against another by jpl (Monk) on May 04, 2011 at 18:23 UTC
Your life will improve measurably if you make `users` a hash instead of an array. Not to discourage anyone from using perl, but you can probably do this as well with cut, sort and comm.	[reply] [d/l]
Re^2: Reconcile one list against another by spartan (Pilgrim) on May 04, 2011 at 18:40 UTC
Holy mackerel... So, if I make each line a hash key, and iterate over users I should increment hash value for each time I get a user match against the line. Then just print out the hash for keys that are zero, or greater than one, depending on whether or not I want to see a list of users that do no match, or do match respectively! I think that will work. Stay tuned for code... Very funny Scotty... Now PLEASE beam down my PANTS!	[reply]
Re^3: Reconcile one list against another by samwyse (Scribe) on May 04, 2011 at 21:03 UTC
Here's a simple way to take the difference of sets. `use Data::Dumper; %one = ( 'a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5, 'f' => 6,); %two = ( 'a' => -1, 'e' => -2, 'i' => -3, 'o' => -4, 'u' => -5,); print Dumper(\%one, \%two); %tmp = %one; delete @tmp{keys %two}; print Dumper(\%tmp); %tmp = %two; delete @tmp{keys %one}; print Dumper(\%tmp);` [download] Running the above produces the following: `$VAR1 = { 'e' => 5, 'c' => 3, 'a' => 1, 'b' => 2, 'd' => 4, 'f' => 6 }; $VAR2 = { 'e' => -2, 'u' => -5, 'a' => -1, 'o' => -4, 'i' => -3 }; $VAR1 = { 'c' => 3, 'b' => 2, 'd' => 4, 'f' => 6 }; $VAR1 = { 'u' => -5, 'i' => -3, 'o' => -4 };` [download]	[reply] [d/l] [select]
Re^2: Reconcile one list against another by jaredor (Priest) on May 05, 2011 at 06:52 UTC
Ditto! However, instead of comm, I tend to use uniq in a pipeline. `cat ${file1} ${file2} ${file2} \| cut -d: -f1 \| sort \| uniq -u` (Slacking off and just doing the easier case for user id in the first field as opposed to the harder case of name extraction from the fifth field.) Translating this to perl comes up with something like what's given in the following answers, but I find it's handy to test the rough idea on the command line if possible. (EDIT: Removed the accidental dot between the paragraph and code tags. Moral: don't do markup after midnight.)	[reply] [d/l]
Re: Reconcile one list against another by Gulliver (Monk) on May 04, 2011 at 19:34 UTC
Check out the module List::Compare. #copied from the documentation use List::Compare; my @Llist = qw(abel abel baker camera delta edward fargo golfer); my @Rlist = qw(baker camera delta delta edward fargo golfer hilton); my $lc = List::Compare->new(\@Llist, \@Rlist); #get_unique() #Get those items which appear (at least once) only in the first list. @Lonly = $lc->get_unique; @Lonly = $lc->get_Lonly; # alias #get_complement() #Get those items which appear (at least once) only in the second list. @Ronly = $lc->get_complement; @Ronly = $lc->get_Ronly; # alias [download]	[reply] [d/l]
Re: Reconcile one list against another by wind (Priest) on May 04, 2011 at 18:41 UTC
Just use grep: `foreach my $line (@file1) { if (my @matched = grep {$line =~ /\Q$_\E/i} @users) { print "YES @matched is on $line \n"; } else { print "nope...\n"; } }` [download]	[reply] [d/l]
Re^2: Reconcile one list against another by jpl (Monk) on May 04, 2011 at 18:53 UTC
This will do what the OP said he wanted to do, but should user ed match a line for user ted or fred. The OP will probably be better off isolating (just) the user names from both files, then doing exact matching.	[reply]
Re^3: Reconcile one list against another by wind (Priest) on May 04, 2011 at 18:59 UTC
That is true, my code was just meant to solve is state variable quandary. Without thinking too deeply about it, your noted problem might be solved by simply adding some word boundaries: `if (my @matched = grep {$line =~ /\b\Q$_\E\b/i} @users) {` [download]	[reply] [d/l]
Re^2: Reconcile one list against another by spartan (Pilgrim) on May 04, 2011 at 19:52 UTC
Far out wind... That is exactly the problem I was trying to solve... How to iterate over the entire list in a (seemingly) atomic operation to see if a list of users could have possibly matched my list of accounts. I kept thinking a loop in a loop, and that was obviously wrong. Very funny Scotty... Now PLEASE beam down my PANTS!	[reply]
Re^2: Reconcile one list against another by spartan (Pilgrim) on May 04, 2011 at 20:23 UTC
Ok, so you can show what users have accounts, unfortunately I'm trying to show the exact opposite. Here is what I came up with: #!/usr/bin/perl use strict; use Data::Dumper; # Grep a user OUT of a file, given a list of names, likely from anothe +r file # psuedo code: # if file1 contains a name from file2, skip the line # if file1 does NOT contain a name from file2, print the line. # # I am using this to determine a list of accounts on the partners serv +er # that, maybe, shouldn't be on there any more. my @users; my %lines; my $file1=$ARGV[0]; my $file2=$ARGV[1]; open FILE1, "<$file1" or die "Cannot open $file1: $!\n"; # Hasherize it!!! foreach my $line (<FILE1>) { chomp($line); $lines{$line}=0; } close FILE1; # Create @users array open FILE2, "<$file2" or die "Cannot open $file2: $!\n"; foreach my $line (<FILE2>) { if ($line =~ /#/) { my @out=split /\s+/,$line; push @users,$out[1]; } } close FILE2; foreach my $key (keys(%lines)) { foreach my $user (@users) { if ($key =~ /$user/i) { $lines{$key}++; } } } print "List of users not authorized to have an account\n"; foreach my $key (sort keys(%lines)) { if ($lines{$key} <= 0) { print "$key\n"; } } print "\nList of users authorized to have an account\n"; foreach my $key (sort keys(%lines)) { if ($lines{$key} > 0) { print "$key\n"; } } [download] It handily prints out what should be there, and what should not. Now all I have to do is some manual verification, and I can call this a lesson learned. Many thanks to all those with quick replies, and mostly to jpl for the hint in the direction of hashes. wind: You code is excellent, but only prints out those folks authorized to have accounts, how would I modify your code to print the inverse list? I tried negating the match (=~ to !=~), but it failed in a quite spectacular way. UPDATE: I added a sort into each foreach loop above to print out the lines in alphabetical order. Very funny Scotty... Now PLEASE beam down my PANTS!	[reply] [d/l]
Re: Reconcile one list against another by anonymized user 468275 (Curate) on May 05, 2011 at 10:34 UTC
If you just want the lists of users to be added and deleted, it can be done in a few unix commands: `awk '{print $2}' < file1.txt \| sort -u > file1.sor awk '{print $2}' < file2.txt \| sort -u > file2.sor comm -13 file1.sor file2.sor > onlyIn2.sor comm -23 file1.sor file2.sor > onlyIn1.sor` [download] and if you want accounts present in both, use comm -12 instead One world, one people	[reply] [d/l]


P is for Practical
	PerlMonks