Compare 2 files and get data

darrengan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Compare 2 files and get data by jpeg (Chaplain) on Sep 05, 2005 at 08:49 UTC
yes, the logic is at fault. The while() loops are executed sequentially and the if block is evaluated after both have completed, so the script is comparing elements from the last line of each file. You'll need to nest those while loops or better yet, read the elements into arrays and compare them. I don't mean to cramp your style, but there are a few points I feel I should mention. you'll learn a lot from `use strict; use warnings;` [download] at the beginning of each of your scripts. chomp (perldoc -f chomp) is a bit safer to use for stripping newlines from a file you've read. chop is guaranteed to strip the last character, which theoretically could be only one part of a system's newline convention. Again, not a huge deal, but most people use all caps for constants (e.g., open(FILE, $file) and avoid all caps for variables. Not a big deal unless you're sharing code with someone else. It's a good idea to close() filehandles that you open(). It's a good idea to check for errors when you interact with your system (like when you're opening files). Try using `open(FILE, $file) or die "open $file failed due to $!"` in a script and `open(FILE, $file)` in another, and use them with a nonexistant file. Again, I don't mean to cramp your style or make you uncomfortable, but these tips help a lot of people. -- jpg	[reply] [d/l] [select]
Re: Compare 2 files and get data by TedPride (Priest) on Sep 05, 2005 at 09:19 UTC
You should also be using a hash, since this drastically decreases the number of comparisons required. The following works (tested it using sample data): `use strict; use warnings; my (%h, $handle); open($handle, 'file1.txt'); while (<$handle>) { # Creates hash using first field of each line as keys $h{(split /,/)[0]} = (); } close($handle); open($handle, 'file2.txt'); while (<$handle>) { # Prints second field of each line if it exists in the hash chomp; print "$_\n" if exists $h{$_ = (split /,/)[1]}; } close($handle);` [download]	[reply] [d/l]
Re: Compare 2 files and get data by reneeb (Chaplain) on Sep 05, 2005 at 08:41 UTC
One way to do it is using Tie::File: `#!/usr/bin/perl use strict; use warnings; use Tie::File; my $file1 = '/path/to/file.txt'; my $file2 = '/path/to/file2.txt'; tie my @lines1,'Tie::File',$file1 or die $!; tie my @lines2,'Tie::File',$file2 or die $!; my @both = grep{my $i = $_; grep{$_ =~ $i}map{(split(/,/,$_))[1]}@line +s2}map{(split(/,/,$_))[0]}@lines1; untie @lines2; untie @lines1; print $_,"\n" for(@both);` [download] It's untested...	[reply] [d/l]
Re: Compare 2 files and get data by blazar (Canon) on Sep 05, 2005 at 10:08 UTC
`open(file1,"file1.txt");` [download] open my $file1, '<', "file1.txt" or die "Can't open `file1.txt': $!\n"; [download] `while (<file1>) { chop();` [download] chomp. `$REC = $_; @LINEREC = split(/\,/,$REC);` [download] (Do a big favour to yourself and) `use strict; use warnings;` [download] at the top of your program. Then the above would have to become `my $REC = $_; my @LINEREC = split(/\,/,$REC);` [download] It's not that much more typing and it won't hurt your fingers. OTOH it will help you immensely to avoid common mistakes... `open(file2,"file2.txt"); while (<file1>)` [download] Huh?!? Did you paste your script or did you retype it? Hint: the former is a much more reliable option... `if ( $LINEREC[0] eq $LINEREC[1]) { print $LINEREC[0]; {` [download] Ditto as above wrt retyping. It is common sense to post code that at least compiles. i have tried the following but doesn't work: Indeed. How 'bout (something along the lines of:) `#!/usr/bin/perl use strict; use warnings; die "Usage: $0 <file1> <file2>\n" unless @ARGV == 2; my ($file2, %have)=pop; while (<>) { chomp; $have{ (split /,/)[0] }=1; } @ARGV=$file2; my %saw; $\="\n"; while (<>) { chomp; local $_=(split /,/)[1]; next if $saw{$_}++; print if $have{$_}; } __END__` [download] Note: this is meant as being a minimal example. If your actual code is more complex of course you'd better open your filehandles explicitly. I also made some assumtpions: for example that you don't want to output duplicate entries or that the format of 'file1' is different from that of 'file2' and that both are fixed and corresponding to the one that one could infer from your examples.	[reply] [d/l] [select]
Re: Compare 2 files and get data by blazar (Canon) on Sep 05, 2005 at 14:04 UTC
Also, a slightly more general example solution taking n>=2 files on the cmd line and listing for each those entries which are in the other ones too, and the actual files these entries are in. Mostly self explanatory, I suppose: `#!/usr/bin/perl use strict; use warnings; die "Usage: $0 <file1> <file2> [<files>]\n" unless +(my @files=@ARGV) >= 2; my %have; chomp, $have{$ARGV}{ (split /,/)[0] }=1 while <>; $\="\n"; for (@files) { { my $sep=':' x length; print "$sep\n$_\n$sep\n"; } my @rest=do { my $ex=$_; grep $_ ne $ex, @files; }; for my $k (sort keys %{ $have{$_} }) { my @dups=grep $have{$_}{$k}, @rest; print "$k => @dups" if @dups; } print ''; } __END__` [download] (this assumes all of the files it is given are in the same format as your "file1". Modify at will!)	[reply] [d/l]
Re: Compare 2 files and get data by josera (Beadle) on Sep 05, 2005 at 13:02 UTC
Hi: Perhaps you wold have to use two while nested. In pseudocode, what i've think that you could use is: `while (<file1>){ $line1=$_ while(<file2>){ $line2=$_ foreach $field1 in $line1 foreach $field2 in $line2 if $field1 eq $field2{ print $field1 } } } } }` [download] It compare each field in file1 with each field in file2. If what you want is that the two fields will be in the same line, then you could use, with only one while: `while (<file1> or <file2>){ #The longest of the two files $line1=read ($file1); $line2=read ($file2); foreach $field1 in $line1 foreach $field2 in $line2 if $field1 eq $field2{ print $field1 } } } }` [download] I hope that this pseudocode will help you Yours sincerelly, José Ramón Martínez	[reply] [d/l] [select]
Re: Compare 2 files and get data by darrengan (Sexton) on Sep 07, 2005 at 01:35 UTC
Hi everyone, Thanks for all the tips and guide. I finally manage to work around the code and solved my problem. I am using a "IF" and "While" within a "While". Below are my code and it works well. I am not too sure if by running in this flow will it take up resourses or not as it will compare 60,000 of records in the file. Hope that this discussion and codes helps other who seek wisdon. `open(file1,"file1.txt") \|\| die ("cannot open file"); while (<file1>) { chop(); $REC = $_; @LINEREC = split(/\,/,$REC); $data1 = @LINEREC[0]; open(file2,"file2.txt") \|\| die ("cannot open file"); while (<file2>) { chop(); $REC = $_; @LINEREC = split(/\,/,$REC); $data2 = @LINEREC[1]; if ($data1 eq $data2) { print "$data1\n"; } } } close file1; close file2;` [download] Cheers, Darren Florist In Malaysia	[reply] [d/l]
Re^2: Compare 2 files and get data by blazar (Canon) on Sep 07, 2005 at 13:02 UTC
Thanks for all the tips and guide. I finally manage to work around the code and solved my problem. I am using a "IF" and "While" within a "While". I'm glad you solved your problem. Incidentally, however, I'd like to point out that there's not such a thing as "IF" in Perl, nor "While". If you want to visually mark in a distinctive manner such keywords, you may put them between `<c>` or `<code>` tags. For example this: `<c>split</c>` is rendered like this: `split`. But for functions, you can also use `[doc://split]` which is rendered as a hyperlink like this: split Below are my code and it works well. I am not too sure if by running in this flow will it take up resourses or not as it will compare 60,000 of records in the file. Well, let's say that it doesn't seem a very smart way to do what you want. I think you should take another look at other suggestions that were given to you, e.g. in terms of using a hash, which seems most reasonable for such a task, instead. (If you didn't understand some of the replies you can ask for further clarification, of course.) Basically you're re-opening and re-reading your second file across all the lines of the first one, and this makes your program IO intensive. However I will add a few further comments about your code as is. Hope that this discussion and codes helps other who seek wisdon. Of course it will, just as much as quite about every discussion here does... First of all, and most importantly (although you may not see why it is, ATM -- but then please trust us!) more than one monk already recommended to put the following two lines at the top of your script: `use strict; use warnings;` [download] You'll notice that with one or two exceptions even those who didn't tell you to do so, did include them in their own code examples. `open(file1,"file1.txt") \|\| die ("cannot open file");` [download] There's nothing strictly wrong with this. But it's better to use "lexical filehandles", the three-args form of open; also, as a general rule you should use the high precedence (short circuiting) logical operators to operate on values and the low precedence ones for flow control; last, it's recommendable to include in your error message a clue about what went wrong, thus put `$!` there. Thus I would have written the above like this: open my $file1, '<', "file1.txt" or die "Can't open `file1.txt': $!\n"; [download] Notice that I also put a `\n` at the end of the die error message, because I prefer it like that, for this kind of errors (I don't think the final user is interested in the additional details that get printed if you omit it), though YMMV. `while (<file1>) { chop();` [download] Nowadays no one ever uses chop to do this. They use chomp instead. Please check the documentation for both. `$REC = $_; @LINEREC = split(/\,/,$REC); $data1 = @LINEREC[0];` [download] No need to copy `$_` to `$REC` just to pass it to split. The former is even the implicit second arg to it, if none is given!! No need to use a temporary array (why all those uppercase letters, BTW?) just to slice it, either. You can slice a list as well. Thus the above may have been simply `while (<$file1>) { chomp; my $data1 = (split /,/)[0]; # ...` [download] Incidentally also note that it's not necessary to quote the comma in the regex, as it has not a special meaning. `open(file2,"file2.txt") \|\| die ("cannot open file");` [download] Hmmm, here your opening the same file over the outer cycle over and over again. But you explicitly close it only out of the outer cycle, at the end of your script along with `file1` (which is not strictly necessary after all, since open filehandles get closed on program ext anyway). Here you could either use a lexical handle as recommended above, which gets automatically closed on exiting the lexical scope it s defined in, or else you may just open it once at the top (but also then, use a lexical in any case!), at the same time as `file1` and use seek to "roll it back". Update: a possible rewrite of your code (same logic!) along the lines of the hints given above: #!/usr/bin/perl -l use strict; use warnings; my ($fh1, $fh2) = map { open my $fh, '<', $_ or die "Can't open `$_': $!\n"; $fh } qw/file1.txt file2.txt/; while (<$fh1>) { chomp; my $data1 = (split /,/)[0]; seek $fh2, 0, 0; while (<$fh2>) { chomp; print $data1 if $data1 eq (split /,/)[1]; } } __END__ [download] HTH	[reply] [d/l] [select]