Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have to compare 3 files.. main_file1 file 2 file 3
I am reading file 2 and file 3 into separate hashes (eg. hash2, hash3). Then I read through main_file1 to check whether the entries in this file match entries in hash2 and hash3.
This is my logic... if the entry from main_file1 is in hash1...then write that line to output1 and also check if this entry from main_file1 is in hash2...write it output2 before proceeding to the next line in the file. For entries that do not match hash2 or hash3 will all be put in output1.
what i have is something like this...just wondering if this logic is correct because I can't test this bit of code alone as it's within a larger script.
while ($line = <MAIN_FILE1>){ chomp $line; if (exists $hash1{$entry}) { print OUTPUT1 "$filename\n"; #I do not have a next statement here #because i want to continue #checking if this entry also exists #in hash2... } if (exists $hash2{$entry}) { print OUTPUT2 "$filename\n"; next; } print OUTPUT1 "$filename\n"; next; } close(MAIN_FILE1)

Replies are listed 'Best First'.
Re: next statement logic
by QM (Parson) on Feb 20, 2006 at 17:55 UTC
    Essentially, you're printing it to OUTPUT1 unless it's found in %hash2, in which case you print to OUTPUT2.

    Also, your description is confusing, as you mention %hash3, but %hash3 never shows up in the code.

    This is how I would change your script, given my understanding of your problem.

    while ($line = <MAIN_FILE1>){ chomp $line; if (exists $hash2{$entry}) { print OUTPUT2 "$filename\n"; next; } # either exists in %hash1, # or not in either, # so output to OUTPUT1 print OUTPUT1 "$filename\n"; next; } close(MAIN_FILE1)

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Thank you all for your prompt response. Few comments to your questions: "izut: Can't figure out why you are printing in the end of the loop..." At the end, I want to print everything that doesn't exist in hash1 or hash2 into OUTPUT1.
      "QM: Also, your description is confusing, as you mention %hash3, but %hash3 never shows up in the code." I apologize for the confusion...it's actually just hash1 and hash2 (I think I mentioned these as hash2 and hash3 elsewhere, but ignore this)
      Here's an illustration of what I'm trying to achieve...I hope this helps.
      DATA:
      hash1: ABC123 ABC456 hash2: ABC456 Main File1: ABC123 ABC456 XYZ123 XYZ456 OUTPUT1: ABC123 ABC456 XYZ123 ###values not found in hash1 or hash2 will go to OUTPUT1 XYZ456 OUTPUT2: ABC456
        If that is an example of what you want to achieve, here is the logic to follow:

        1) Everything is printed to OUTPUT1, including anything that matches in either %hash1 or %hash2.

        2) Anything found in %hash2 should also be printed to OUTPUT2.

        If I have that correct, then my example should be changed to the following:

        while ($line = <MAIN_FILE1>){ chomp $line; print OUTPUT1 "$filename\n"; if (exists $hash2{$entry}) { print OUTPUT2 "$filename\n"; } } close(MAIN_FILE1)
        Note that no next statement is needed, regardless of the order of print/if.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: next statement logic
by Not_a_Number (Prior) on Feb 20, 2006 at 19:12 UTC

    Frankly, your request is not very intelligible. Apart from the confusion over %hash1, %hash2 and %hash3, you use two variables in your code snippet that seem to come from nowhere ($entry and $filename).

    However, following your update, which appears to clarify things somewhat, I will give it a stab. If I have still misunderstood your requirements, then please try to make them even clearer.

    To begin with:

    I can't test this bit of code alone as it's within a larger script.

    In that case, write a smaller script enabling you to test your logic. For example, you could push your lines into different arrays instead of printing them to different filehandles, and you could use __DATA__ to simulate input rather than opening a filehandle. Here is an example of how you could do this:

    Having run this, you will see that your logic is indeed flawed: any item in %hash1 (ABC123 in this case) will be printed to OUTPUT1 twice.

    Here's a workaround, that can no doubt be improved upon:

    use strict; use warnings; my %hash1 = ( ABC123 => '', ABC456 => '' ); my %hash2 = ( ABC456 => ''); my ( @out1, @out2 ); while ( my $entry = <DATA> ) { chomp $entry; my $seen; if ( exists $hash1{$entry} ) { push @out1, $entry; $seen = 1; } if ( exists $hash2{$entry} ) { push @out2, $entry; next; } push @out1, $entry unless $seen; } print "\nOUTPUT1:\n"; print "$_\n" for @out1; print "\nOUTPUT2:\n"; print "$_\n" for @out2; __DATA__ ABC123 ABC456 XYZ123 XYZ456
Re: next statement logic
by izut (Chaplain) on Feb 20, 2006 at 17:54 UTC

    You don't need the last next statement. You code could be like this:

    while ($line = <MAIN_FILE1>) { print OUTPUT1 "$filename\n" if (exists($hash1{$entry})); print OUTPUT2 "$filename\n" if (exists($hash2{$entry})); }

    Can't figure out why you are printing in the end of the loop...

    Hope this helps.

    Igor 'izut' Sutton
    your code, your rules.

Re: next statement logic
by pKai (Priest) on Feb 20, 2006 at 17:54 UTC

    AFAICS (and providing $entry is somewhat sensible derived from $line in your real script) the snippet fits your description.

    The next in the 2nd if will take care that you don't write to OUTPUT1 again (in case you already did in the 1st if).

    The last next isn't necessary, since you are already at the end of the loop body there.