Re: Comparing lines of multiple files

Your statement of the problem is a little confusing. You said:

I am trying to compare all of the lines from three files and then print the result to a final file.

But your code and data samples involve only two input files, not three. Next, you said:

If the ids are not the same, I don't want it to write anything to the final file unless one of the ids was blank, but I do want to write them if they are the same, such as:

But you show an example for "Final" output that has one line where the two inputs were identical (no diffs), followed by two lines whose index values exist only in "file 1". (And what do you mean, exactly, by "unless one of the ids was blank"?)

Maybe part of the problem is that you don't have an accurate and coherent spec for what the script is supposed to do? If there really are just two inputs, and those three lines you show under "Final:" are really the correct desired output, then it looks like the spec would be something like this:

For each line in File 1, print it to Final if: (a) the ID/Key value and data are identical to a line in File 2, or (b) the ID/Key value is not found in File 2.

For that, the following is one way to do it:

use strict;

my ( $file1, $file2 ) = @ARGV;
# (getting file names from command line is better than hard-coding the
+m)

# read file2 first, to get the keys and data to test against

my %refdata;

open( F, $file2 ) or die "$file2: $!";
while (<F>) {
    my ( $key, $data ) = split( /,/, $_, 2 );  # (in case key is not 4
+ digits)
    $refdata{$key} = $data;
}

# now read file1, and output lines that meet the spec

open( F, $file1 ) or die "$file1: $!";
while (<F>) {
    my ( $key, $data ) = split( /,/, $_, 2 );
    print if ( !exists( $refdata{$key} ) or $data eq $refdata{$key} );
}
# (use the command line to redirect output to a "final" file -- e.g.:
#
#    shell>  perl your_script file1 file2 > final
#
# again, it's better than hard-coding another file name
[download]

Comment on Re: Comparing lines of multiple files Download Code

Replies are listed 'Best First'.
Re^2: Comparing lines of multiple files by Zed_Lopez (Chaplain) on Oct 09, 2005 at 19:52 UTC
After much head-scratching (I originally wrote a "what are you asking here?" response, too), I decided that what the OP meant was: If an ID occurs in only one file, print the corresponding line. If an ID occurs in multiple files, and all the corresponding lines have the exact same text, print the line. This does correspond to the sample output. (I'm still puzzled by 'unless one of the IDS was blank.')	[reply]
Re^3: Comparing lines of multiple files by oomwrtu (Novice) on Oct 09, 2005 at 23:06 UTC
Thank you to everyone for your patience. I stumbled on this site and was so excited about the possibility of solving this problem that I didn't take as much time rereading what I posted (I know that's not a good thing). One thing I would like to clear up is that I am using this on a webpage, so many of the errors that you guys might be seeing aren't shown (unless I check the logs, which I should do). graff's code and Zed_Lopez's rewording had it almost entirely correct for two files. I actually have 3 files that I would like to combine, but I reduced it to 2 when I was working on it to try and simplify it. -:-:- I deleted the rest of what I said because GrandFather posted code that I was able to use and adapt for three files. I am pretty sure it works as I want it to. It isn't nearly as efficient as graff's code, but it works. :D Again, thank you to everyone for your help. -:-:-	[reply]
Re^4: Comparing lines of multiple files by Tortue (Scribe) on Oct 11, 2005 at 11:02 UTC
Here's a first pass at cleaning up the main loop of your code. It's not tested, so don't trust it, but it ought to do exactly the same thing, faster. The code is easier to read this way. So easy that I can see a BUG! (I left it in with a comment). The program could be made even clearer and further optimized, but this is a start. For example you can replace all the `print DAT $c;` with `$all .= $c;` and, wait till the end to open the file for append, `print DAT $all;`, and close. By the way, if you can, you should test this in a standalone program on your computer, not just on the web. # 1. Only open/close the file once to append, instead of $maxid times. # 2. Use temporary values. # 3. Delete stuff you don't need immediately, not next time around loo +p. # 4. In this case, ($c) is same as (defined $c) (cosmetic). open(DAT,">>data/parsed-all.txt"); for(my $i = 1; $i <= $maxid; $i++) { my $currid = changeID($i); my ($c1,$c2,$c3) = ($compare1{$currid}, $compare2{$currid}, $compare +3{$currid}); delete $compare1{$currid}, $compare2{$currid}, $compare3{$currid}; next if( $c1 && $c2 && $c3 && $c1 ne $c2 && $c1 ne $c3 && $c2 ne $c3 + ); if( $c1 && !$c2 && !$c2 ) { print DAT $c1; next; } if( $c2 && !$c1 && !$c3 ) { print DAT $c2; next; } if( $c3 && !$c1 && !$c2 ) { print DAT $c2; # <-- BUG HERE! next; } if( $c1 && $c2 ) { if( $c1 eq $c2 ) { print DAT $c1; next; } } if( $c1 && $c3 ) { if( $c1 eq $c3 ) { print DAT $c1; next; } } if( $c2 && $c3 ) { if( $c2 eq $c3 ) { print DAT $c2; next; } } } close(DAT); [download]	[reply] [d/l]


Welcome to the Monastery
	PerlMonks