Looping two files differently

noob_mas has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, i have done create a script where a portion of the script will search through a file (line by line) in comparison of an array. The end goal of this portion is to flag out "error message" if there is no such element from the array in the file that i am searching.

I try to solve this using two different ways of looping through a file and i found the solution by choosing the 1st method

My Question is there any other preferred solution to solve this?

I do paste a snippet of my code and the generated output here for reference.

The first method

 use strict;
use warnings;
use Getopt::Long;
my ($cfg,$files,$output); 
$output = 'Output.summary';
GetOptions ('cfg=s' => \$cfg, 'rtl=s' => \$files); 
open my $fh1, '<', $cfg or die "Can't open $cfg: $!"; 
open my $fh2, '<', $files or die "Can't open $files: $!"; 
open my $fh3, '>', $output or die "Can't open $output: $!";

my @array_1 = <$fh2>;
my ($element);
my ($data_1,$data_2,$data);
my %hash ;

while (<$fh1>) {
 $data = $_;
 chomp $data;
 next unless /\S/;
 foreach  $element(@array_1) {
 next unless /\S/;
 chomp $element;
 if ($element =~ quotemeta($data))
 {
 $hash{$data} = "$data exist_1";
 last;
 }
 else {
 $hash{$data} = "$data doesn't exist";
 }
 }
}

while (my($key,$value) = each(%hash)) {
print $fh3 "$key => $value\n";
}
[download]

Second method (edit the following lines only)

 
my @array_1 = <$fh1>;

while (<$fh2>) {

 if ($data =~ quotemeta($element))
 {
 $hash{$element} = "$element exist_1";
 last;
 }
 else {
 $hash{$element} = "$element doesn't exist";
 }
 }
}
[download]

Output for the 1st method followed by the 2nd method

 
xdrvo[93] => xdrvo[93] exist_1
x106_in => x106_in exist_1
hlkout => hlkout exist_1
xdrvo[63] => xdrvo[63] exist_1
v7drvo0 => v7drvo0 exist_1
xdrvo[1002] => xdrvo[1002] doesn't exist
xdrvo[95] => xdrvo[95] doesn't exist
x95_in => x95_in exist_1
[download]

2nd one

xdrvo[93] => xdrvo[93] doesn't exist
x106_in => x106_in doesn't exist
hlkout => hlkout doesn't exist
xdrvo[63] => xdrvo[63] doesn't exist
v7drvo0 => v7drvo0 doesn't exist
xdrvo[1002] => xdrvo[1002] doesn't exist
xdrvo[95] => xdrvo[95] doesn't exist
x95_in => x95_in doesn't exist </p>
[download]

Thank you in advance for all the folks out there who will reply and help to explain on this. From my opinion, one key point to consider is the line number differences for the cfg_file and the main file(where it search line by line).

Comment on Looping two files differently Select or Download Code

Replies are listed 'Best First'.
Re: Looping two files differently by VincentK (Beadle) on Jan 02, 2014 at 17:03 UTC
Hi noob_mas, It looks like you are comparing each line in your first file to every line in your second file. In this case, a match in the first file say on line two could match line 50 in file two ( or multiple lines if files contain duplicates ). Is this what you want? If you want a line by line comparison, I would compare each line in order between the files and then note the differences. Here is a sub that given the file handles for the input and output files will write out the results of the file comparison to the output file. I am sure there is a more efficient way to do this , perhaps using Array Compare http://search.cpan.org/~davecross/Array-Compare-2.02/lib/Array/Compare.pm, but this does seem to work. The sub will use the largest file for the compare. It will stop processing if it reaches the end of the smaller file before the larger file. sub compare_files($$$) { my $FH1 = shift; my $FH2 = shift; my $OUTPUT_FH = shift; my @FILE1_ARRY = <$FH1>; my @FILE2_ARRY = <$FH2>; my $CMPR_ARRY_REF = \@FILE1_ARRY; my @CMPR_ARRY = @{$CMPR_ARRY_REF}; my $CMPR_ARRY_NUMBER = 1; if ($#FILE1_ARRY > $#FILE2_ARRY) { print "FILE 1 is larger than FILE 2\n"; print $OUTPUT_FH "FILE 1 is larger than FILE 2\n"; } elsif ($#FILE1_ARRY < $#FILE2_ARRY) { print "FILE 1 is smaller than FILE 2\n"; print $OUTPUT_FH "FILE 1 is smaller than FILE 2\n"; $CMPR_ARRY_REF = \@FILE2_ARRY; @CMPR_ARRY = @{$CMPR_ARRY_REF}; $CMPR_ARRY_NUMBER = 2; } else { print "FILE 1 is the same size as FILE 2\n"; print $OUTPUT_FH "FILE 1 is the same size as FILE 2\n"; } print ""x70,"\n"; print $OUTPUT_FH ""x70,"\n"; for(my $iter = 0; $iter <= $#CMPR_ARRY; $iter++) { if ( $CMPR_ARRY_NUMBER == 1 ) { print "END OF FILE 2, BUT NOT FILE 1. STOPPING COMPARE AT +LINE ".($iter+1)." OF FILE 1\n" if ($iter > $#FILE2_ARRY); print $OUTPUT_FH "END OF FILE 2, BUT NOT FILE 1. STOPPING +COMPARE AT LINE ".($iter+1)." OF FILE 1\n" if ($iter > $#FILE2_ARRY); last if ($iter > $#FILE2_ARRY); if ( quotemeta($CMPR_ARRY[$iter]) eq quotemeta($FILE2_ARRY +[$iter]) ) { print $OUTPUT_FH "Line ".($iter+1)." matches between b +oth files\n"; } else { print $OUTPUT_FH "Line ".($iter+1)." does NOT match be +tween both files\n"; } } else { print "END OF FILE 1, BUT NOT FILE 2. STOPPING COMPARE AT +LINE ".($iter+1)." OF FILE 2\n" if ($iter > $#FILE1_ARRY); + print $OUTPUT_FH "END OF FILE 1, BUT NOT FILE 2. STOPPING +COMPARE AT LINE ".($iter+1)." OF FILE 2\n" if ($iter > $#FILE1_ARRY); last if ($iter > $#FILE1_ARRY); if ( quotemeta($CMPR_ARRY[$iter]) eq quotemeta($FILE1_ARRY +[$iter]) ) { print $OUTPUT_FH "Line ".($iter+1)." matches between b +oth files\n"; } else { print $OUTPUT_FH "Line ".($iter+1)." does NOT match be +tween both files\n"; } } } print ""x70,"\nCompare Complete."; print $OUTPUT_FH ""x70,"\nCompare Complete."; } [download] Once in place, you can call with sub with : `compare_files($FH1,$FH2,$OUTPUT_FH);` [download] I hope this helps. If not, maybe you can modify this sub to further suit your needs.	[reply] [d/l] [select]
Re^2: Looping two files differently by noob_mas (Novice) on Jan 03, 2014 at 02:11 UTC
Hi VincentK, "you are comparing each line in your first file to every line in your second file" Yes , i have a "config" file where i can configure the content of the file according to some element/name or etc. then this config file will search through another file(second file) line by line to check if the element/name from my config file exist or not. If yes, then the line will processed further, else if not then it will flag an error message. "In this case, a match in the first file say on line two could match line 50 in file two ( or multiple lines if files contain duplicates ). Is this what you want?" YES exactly, let say in the case of duplicate then it doesnt matter because it will be overwritten because i assigned it to a hash Thank you for your help and the sub that you give as well. I can use the sub as well with a bit of tweak. Thank you	[reply]
Re: Looping two files differently by Random_Walk (Prior) on Jan 02, 2014 at 15:38 UTC
Do you have some example data of the files you are working with? Are you looking for exact matches, or will you want to be able to specify regex type patterns to match? Will the entire line match, or do you just need to find the target anywhere in the line? Cheers, R. Pereant, qui ante nos nostra dixerunt!	[reply]
Re^2: Looping two files differently by noob_mas (Novice) on Jan 03, 2014 at 00:51 UTC
HI Random_Walk, yes you are right about the exact matches, i am just looking for a word that matched in a line of files. Example of the element that i am looking for in a file `hlkout` [download] Example of the line that matched the element `.hlkout(hi_lkout[15:0])` [download] from the line that matched my regex, i would further do some processing on it, to get something else like for this example i will be using the hi_lkout for another purpose which i dont describe here...	[reply] [d/l] [select]
Re: Looping two files differently by Random_Walk (Prior) on Jan 03, 2014 at 10:09 UTC
Does this, untested, code help? use strict; use warnings; use Getopt::Long; my ( $cfg,$files ); my $output = 'Output.summary'; GetOptions ('cfg=s' => \$cfg, 'rtl=s' => \$files); open my $fh1, '<', $cfg or die "Can't open $cfg: $!"; open my $fh2, '<', $files or die "Can't open $files: $!"; open my $fh3, '>', $output or die "Can't open $output: $!"; # Read the config file first # less memory required assuming it is the smaller file my @patterns; while (<fh1>) { next unless /\S/; chomp; push @patterns, $_; } my %results; # Now look through the other file while (my $data = <$fh2>) { next unless /\S/; chomp $data; for my $pattern (@patterns) { if (index $data, $element) { # don't need regex for exact match $results{$data} = "matches $element"; last; } else { $results{$data} = "doesn't match"; } } } while (my($key,$value) = each(%results)) { print $fh3 "$key => $value\n"; } [download] Cheers, R. Pereant, qui ante nos nostra dixerunt!	[reply] [d/l]