File handles in regular expressions

vikasdawar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: File handles in regular expressions by aitap (Curate) on Oct 18, 2012 at 18:08 UTC
Did you forget to chomp `$comp2`? Lines read from file handles can contain newline characters (usually `\n`). Sorry if my advice was wrong.	[reply] [d/l] [select]
Re: File handles in regular expressions by 2teez (Vicar) on Oct 18, 2012 at 18:53 UTC
Hi Vikasdawar, Please for you open function check if it works or display an error message or you use `autodie qw(open close)` Please, try the following below: #!/usr/bin/perl use warnings; use strict; use Tie::File; tie my @array_file, 'Tie::File', "file1.txt" or die "can't tie file: $ +!"; my $matched_lines = ''; open my $fh, '>', "file3.txt" or die "can't open file: $!"; open my $fh2, '<', "file2.txt" or die "can't open file: $!"; while ( defined( my $line = <$fh2> ) ) { chomp $line; foreach my $match (@array_file) { if ( $match eq $line and $match ne "") { $matched_lines .= $match.$/; next; } } } print {$fh} $matched_lines; close $fh2 or die "can't close file:$!"; close $fh or die "can't close file:$!"; untie @array_file; [download] NOTE: The code above thus work,(or should work) but might not be so effiecient if it has a VERY VERY LARGE files to compare. If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l] [select]
Re^2: File handles in regular expressions by Lotus1 (Vicar) on Oct 18, 2012 at 19:19 UTC
Hi. Did you choose to use Tie::File instead of just reading file1.txt into an array so that very large files could be handled? If so there is a problem with concatenating all the output into a scalar. $matched_lines could end up holding the whole huge file. One possible solution is to replace `$matched_lines .= $match.$/;` with `print $fh $match.$/;` Just print incrementally.	[reply] [d/l] [select]
Re^3: File handles in regular expressions by 2teez (Vicar) on Oct 18, 2012 at 23:55 UTC
Hi Lotus1, If so there is a problem with concatenating all the output into a scalar. $matched_lines could end up holding the whole huge file. One possible solution is to replace `$matched_lines .= $match.$/; with print $fh $match.$/;` Just print incrementally. Not so, am afraid your suggestion will further affect the performance of the script, because the print function would be call as many times as the strings matches, meanwhile with the scalar used no call is placed. Using a Profiler (NYTProf) made that very clear. Try it. If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l]
Re^4: File handles in regular expressions by Lotus1 (Vicar) on Oct 19, 2012 at 02:16 UTC
Re: File handles in regular expressions by tobyink (Canon) on Oct 18, 2012 at 19:06 UTC
What do you mean by "if they match"? Do you mean, "if they are identical strings"? If so, string comparison (using the `eq` operator) is almost certainly a better idea than using regexes. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re: File handles in regular expressions by Kenosis (Priest) on Oct 19, 2012 at 07:04 UTC
Hi, Vikas, and welcome to PerlMonks! You've enclosed one loop within another, so you're attempting to compare the first `$comp1` value against all elements of `@var2`, and so on. And tobyink's point about "if they match" is well made, so I suspect you want `$comp1 eq $comp2`. Given this, consider the following: `use strict; use warnings; my %matchingLines; open my $fh1, '<', 'File1.txt' or die $!; chomp( my @file1Lines = <$fh1> ); close $fh1; open my $fh2, '<', 'File2.txt' or die $!; chomp( my @file2Lines = <$fh2> ); close $fh2; for my $file1Line (@file1Lines) { $matchingLines{"$file1Line\n"}++ if $file1Line ~~ @file2Lines; } open my $fh3, '>', 'FileA.txt' or die $!; print $fh3 $_ for keys %matchingLines; close $fh3;` [download] If a line in `@file1Lines` is found in `@file2Lines` via the smart match operator (works as equality), it's added to the hash `%matchingLines` for later `print`ing to a file (the hash is used to avoid the possibility of writing multiple instances of the same line to the file). Hope this helps! Update: Lotus1 correctly brought to my attention that I misunderstood the OP. Have revised the script.	[reply] [d/l] [select]
Re^2: File handles in regular expressions by Lotus1 (Vicar) on Oct 19, 2012 at 15:26 UTC
Your solution only matches if the line at the same line number in both files is the same. The OP was attempting to match each line in file1 against each line in file2. For the files listed below only the first line is matched even though there are four lines that match. `File1.txt: item1 item2 item3 item4 File2.txt: item1 abc item2 item3 item4 File3.txt: item1` [download]	[reply] [d/l]
Re^3: File handles in regular expressions by Kenosis (Priest) on Oct 19, 2012 at 16:15 UTC
Wow! I certainly misunderstood the OP. Will strike/revise this. Thank you for pointing this out.	[reply]
Re: File handles in regular expressions by Laurent_R (Canon) on Oct 21, 2012 at 13:02 UTC
First, three comments: 1. check the status of the open instructions; 2. chomp the lines you are reading to remove newline characters 3. use the eq operator instead of a regex, unless you have good reason to use regexes. If the files are not too large (or, rather, if at least one of the files is not too large), read one of the files and store it in memory as a hash (using the full chomped line as the key). Once this is done, go through the other file and check if the line exists in the hash. If it exists, juste print it to your output file. This will be much faster than your nested foreach loops.	[reply]
Re: File handles in regular expressions by locked_user sundialsvc4 (Abbot) on Oct 22, 2012 at 13:36 UTC
Of course, on a Unix/Linux system you can do this with the `diff` command, with appropriate options (that might be system specific). I say this because this is an extremely common requirement and yet it is also very common to build one-off custom programs to satisfy such requirements. I say that without specific reference to this particular case or person. “I need to write a program to do this” is a conclusion that is quickly and easily jumped-to, especially when the prospect of doing so seems daunting. `TMTOWTDI™`, and sometimes TOWTDI isn’t Perl or a custom program at all.
Re^2: File handles in regular expressions by Laurent_R (Canon) on Oct 24, 2012 at 14:02 UTC
Yes, diff can be useful, and on Windows, you can use Winmerge, a public domain utility to compare files (there are most probably others). But these utilities require the files to be sorted in the same order, which might not be the case. And if you have to start sorting each file before comparing them, then a simple Perl one-liner might do the job faster. At my work, we are using daily all kinds of combinations of Unix "power tools", including pipes and redirections to connect diff, sort, wc, cat, grep, find, cut, sed, awk, etc. commands, but Perl offers very often a better, simpler and faster way to do things. And when I have to work on VMS or on Windows, where you don't have sed, cut or awk, Perl shows its superiority even more blatantly.	[reply]