hash comparison & nested selections

silentc has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,

This is the first time I've done this, so I hope I'm doing it properly. This should be fairly simple for anyone fairly experienced. Thanks in advance. :)

# I have a hash of usernames %postiniList.
# I have another hash of usernames %ldapList.
# I have a text file in the working dir called
# users.txt which contains one line per email
# address with other unnecessary crap also.
# We are traversing %postiniList to see if each key exists
# in %ldapList and take action if it does NOT.

foreach (keys %postiniList) {
                                                                      
+                                                                     
+        
   if(exists $ldapList{$_}) {
      print "$_ exists in ldap\n";
      next;
   } else {
      print "$_ does not exist in ldap\n";
      my $emailString = `cat users.txt |grep $_`;
      # these next three lines are messing it all up!
      if($emailString =~ /(\w+\@\w+\.\w+)/) {
         print "deleteuser $1\n";
      }
   }
}
[download]

_{janitored by ybiC: Moved introductory text out of <code> tags.}

Comment on hash comparison & nested selections Download Code

Replies are listed 'Best First'.
Re: hash comparison & nested selections by graff (Chancellor) on Sep 17, 2003 at 04:23 UTC
Whether the code is being done properly depends on how you are coming up with the keys for %postiniList. You are passing each key to a shell command without any sort of safeguard. Apart from being risky, using these key values in the shell command is unnecessary -- in fact, the shell command itself is unnecessary (and is probably wasting time). As for the part that's messing you up, your pattern to check for an email address in $emailString is probably only capturing portions of some addresses -- your regex, when given a perfectly acceptable address like "my.name@host.domain.net", will only capture "name@host.domain", which would be wrong. Is that the sort of problem you're asking about? (You didn't really indicate what sort of problem you're having.) What sort of stuff is in users.txt? Is it a flat table file with some sort of regular delimiter between the fields on each line, and is the email address always in the same field position? If so, then you'd want to use split instead of a regex to grab the email address; e.g. suppose the fields are tab-delimited, and the the email address is the second field: `open( USRS, "users.txt" ) or die "no users.txt: $!"; my @emailStrings = <USRS>; # let's just read this once close USRS; foreach ( keys %postiniList ) { if ( exists $ldapList{$_} ) { print "$_ exists in ldap\n"; # don't need a "next;" here } else { print "$_ does not exist in ldap\n"; my ( $match ) = grep /\Q$_\E/, @emailStrings; my $emailString = ( split /\t/, $match )[1]; print "deleteuser $emailString\n" } }` [download] You're still likely have some trouble with the grep, if your username keys include things like "rob", "robert", "latrobe", etc. When you hand the shortest one of these to grep (in perl or on the command line), it will return all three users. In your original code, the value of $emailString would contain three entries from the users.txt, separated by newline characters within the one long scalar string. If "rob" comes later in the file than "latrobe", you never get to see rob's email address. As for the code I suggested above, $emailString will contain only one entry from users.txt, but if "latrobe" came first in the file, that will be the entry you get for "rob". (update: you can fix this easily enough, of course, just by using suitable anchors around the username in the grep regex, e.g. /\b\Q$_\E\b/ or whatever) Let me make another guess, that the usernames (keys for %postiniList and %ldapList) happen to be another delimited field in users.txt, in which case you'd want to read users.txt once, and for each line therein, use split to get both the username and email fields -- eg. suppose that they are actually the first and third fields, respectively: `open( USRS, "users.txt" ) or die "no users.txt: $!"; while (<USRS>) { my ($keystring, $emailstring) = (split /\t/, $_)[0,2]; if ( exists $postiniList{$keystring} ) { my $report = "$keystring exists in ldap\n"; unless ( exists $ldapList{$keystring} ) { $report =~ s/ exists / does not exist /; $report .= "deleteuser $emailstring\n" if ( $emailstring ); } print $report; } } close USRS;` [download]	[reply] [d/l] [select]
Re: hash comparison & nested selections by shenme (Priest) on Sep 17, 2003 at 03:17 UTC
It's late and I can't see your problem, I guess, but a couple notes. You're liable to get the On Useless Cats award of the week from Merlyn. Why would you say "`cat file \| grep 'thing'`" when you could just say "`grep 'thing' file`" ? And when you do search for a user name in users.txt, wouldn't you like to make sure you get the _right_ user? Suppose there's a user 'roo' - you wouldn't want to delete 'root' by mistake would you? You should be more careful in your grep to limit how you match things. Even using '-w' isn't good enough. What if there were an email address "roo@where.ru" for some other user? Could it be that some of your email addresses are more complex than just "user@domain.tld"? Maybe you should accept a larger set of email address patterns? Any chance you could have casing problems, such that a user name in one of the three locations, %postiniList or %ldapList or users.txt, could be upper- vs. lowercase elsewhere? You might have to show us more code and more of the actual data. And what does "messing it all up!" mean?	[reply]