Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks!
I have two files, in the following formats:
FILE1 peter 1234 nick 1111 john 4567 mike 3333 george 2222 antony 5632 migel 1209 FILE2 john 7559 mike 3333 george 2222 peter 5643 nick 1111 julia 3456
What I want to do is:
Read through FILE1 and store name and number in a hash, call it %hash1 [I have done it]
Foreach $key of %hash1, I must check if it exists in FILE2. If it exists, I must then check if the number in FILE1 [$hash1{$key}] is the same with the number in FILE2 {$hash2{$key}}. If it is the same, I print $key."\t".'OK'."$hash1{$key}";
and if the numbers don't match, I will print $key."\tWRONG".$hash1{$key}.
Also, if a $key of $hash1 does not exist in FILE2, I must print it.
To be more clear, I must have:
peter WRONG 1234 nick OK 1111 john WRONG 4567 mike OK 3333 george OK 2222 antony WRONG 5632 migel WRONG 1209
I have written:
open ONE, $FILE1; while (<ONE>) { chomp; if ($_=~/^(.*)\t(.*)/) { $hash1{$1} =$2; } } foreach $key(keys %hash1) { open TWO, $FILE2; while (<TWO>) { chomp; if ($_=~/^$key\t(.*)/) { if ($1 eq $hash1{$key}) { print $key."\t".'OK'."$hash1{$key}"."\n"; } else { print $key."\t".'WRONG'."\t".$hash1{$key}."\n"; } } } }
What I cannot do is print all names of FILE1 that ARE NOT in FILE2.
Any hints?

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: iterating through a hash?
by shmem (Chancellor) on Sep 17, 2006 at 10:08 UTC

    Why do you open file2 for every key in %hash1, go through that file from start to end to only look at the key at one line?

    It suffices to open file2 once, read in key/value pairs (as you did with file1); whilst doing so you can find out whether the current key exists in file1:

    open TWO, '<', $FILE2; # better be explicit ;-) while (<TWO>) { chomp; # correct indent level is here. my ($key,$value) = split /\s+/,$_; print "$key NOT IN FILE 1\n" unless $hash1{$key}; $hash2{$key} = $value; }
    Now you have %hash1 and %hash2 populated; you can iterate over the key lists and see if they exist in each other, and if so, whether their values are identical:
    foreach my $key (keys %hash1) { if(exists($hash2{$key}) { if($hash1{$key} == $hash2{$key}) # '==' if numeric, 'eq' if + string { ... } else { ... } } else # this is the case you missed - $key not in file2 { ... } }

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      The solution could even use just 1 hash if the whitespace separating the name and number is the same (same number of tabs or spaces).
      #!/usr/bin/perl use strict; use warnings; open FH2, "o44.txt" or die $!; my %file2 = map{ $_ => 1} <FH2>; close FH2 or die $!; open FH1, "o33.txt" or die $!; while (<FH1>) { s/\s+/$file2{ $_ } ? " OK " : " WRONG "/e; print; } close FH1 or die $!;
      Cristoforo is right about needing only one hash to do this, but it doesn't need to depend on having identical whitespace in the two files:
      use strict; my %hash; open( F, "<", "file1" ) or die "file1: $!"; while (<F>) { my ( $name, $val ) = split; $hash{$name} = $val; } open( F, "<", "file2" ) or die "file2: $!"; while (<F>) { my ( $name, $val ) = split; if ( exists( $hash{$name} ) { my $status = ( $hash{$name} eq $val ) ? 'OK':'WRONG'; print "$name\t$status\t$hash{$name}\n"; delete $hash{$name}; # don't need this anymore } } # any keys left in hash are cases where a file1 name was not found in +file2 # so print those now: if ( keys %hash ) { print "\nNames not found in file2:\n"; print "$_\t$hash{$_}\n" for ( sort keys %hash ); }
      (updated the code so that the names are always printed with the values from file1, as per the OP spec.)

      Nothing happens for names in file2 that don't exist in file1, but the OP didn't say that anything needed to be done for those. There was also nothing said about the same name occurring more than once in either file, but maybe the AM won't need to worry about that...

      Thank you very much! You are correct, I should have used 2 hashes instead, it's much easier that way...
Re: iterating through a hash?
by Anonymous Monk on Sep 17, 2006 at 10:42 UTC
    use List::Compare; for (List::Compare->new([keys %h1], [keys %h2])->get_Lonly) { print "$_ $h1{$_}\n"; }
      This doesn't list each item from file1. It only lists those items in file1 not in the second file.
Re: iterating through a hash?
by Not_a_Number (Prior) on Sep 17, 2006 at 23:12 UTC
    use strict; use warnings; open my $fh, '<', $FILE2 or die "Can't open file2: $!\n"; my %hash = map split, <$fh>; open $fh, '<', $FILE1 or die "Can't open file1: $!\n"; while ( <$fh> ) { my ( $k, $v ) = split; my $ok = ( defined $hash{$k} and $hash{$k} == $v ) ? 'OK' : 'WRONG' +; print join( "\t", $k, $ok, $v ), "\n"; }

    Update: Output, as per spec (tabs being equal):

    peter WRONG 1234 nick OK 1111 john WRONG 4567 mike OK 3333 george OK 2222 antony WRONG 5632 migel WRONG 1209
Re: iterating through a hash?
by graff (Chancellor) on Sep 17, 2006 at 21:40 UTC
    shmem is right about how you should open the second file only once, and as other sub-replies there point out, just one hash is enough to do everything you want.

    After you read the first file into a hash, you read each line of the second file, but instead of storing this, you check if each name exists in the hash; if so, print "OK" or "WRONG" according to whether the values match and then delete that hash element. When done reading the second file, any remaining hash elements are names in file1 that were not in file2 -- so that is where you iterate through the hash, to print those out at the end.