Re: File handles in regular expressions
by aitap (Curate) on Oct 18, 2012 at 18:08 UTC
|
| [reply] [d/l] [select] |
Re: File handles in regular expressions
by 2teez (Vicar) on Oct 18, 2012 at 18:53 UTC
|
Hi Vikasdawar,
Please for you open function check if it works or display an error message or you use autodie qw(open close)
Please, try the following below:
#!/usr/bin/perl
use warnings;
use strict;
use Tie::File;
tie my @array_file, 'Tie::File', "file1.txt" or die "can't tie file: $
+!";
my $matched_lines = '';
open my $fh, '>', "file3.txt" or die "can't open file: $!";
open my $fh2, '<', "file2.txt" or die "can't open file: $!";
while ( defined( my $line = <$fh2> ) ) {
chomp $line;
foreach my $match (@array_file) {
if ( $match eq $line and $match ne "") {
$matched_lines .= $match.$/;
next;
}
}
}
print {$fh} $matched_lines;
close $fh2 or die "can't close file:$!";
close $fh or die "can't close file:$!";
untie @array_file;
NOTE: The code above thus work,(or should work) but might not be so effiecient if it has a VERY VERY LARGE files to compare.
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
Hi Lotus1, If so there is a problem with concatenating all the output into a scalar. $matched_lines could end up holding the whole huge file. One possible solution is to replace $matched_lines .= $match.$/; with print $fh $match.$/; Just print incrementally.
Not so, am afraid your suggestion will further affect the performance of the script, because the print function would be call as many times as the strings matches, meanwhile with the scalar used no call is placed. Using a Profiler (NYTProf) made that very clear. Try it. If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me
| [reply] [d/l] |
|
|
Re: File handles in regular expressions
by tobyink (Canon) on Oct 18, 2012 at 19:06 UTC
|
What do you mean by "if they match"? Do you mean, "if they are identical strings"? If so, string comparison (using the eq operator) is almost certainly a better idea than using regexes.
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
| [reply] [d/l] |
Re: File handles in regular expressions
by Kenosis (Priest) on Oct 19, 2012 at 07:04 UTC
|
Hi, Vikas, and welcome to PerlMonks!
You've enclosed one loop within another, so you're attempting to compare the first $comp1 value against all elements of @var2, and so on. And tobyink's point about "if they match" is well made, so I suspect you want $comp1 eq $comp2.
Given this, consider the following:
use strict;
use warnings;
my %matchingLines;
open my $fh1, '<', 'File1.txt' or die $!;
chomp( my @file1Lines = <$fh1> );
close $fh1;
open my $fh2, '<', 'File2.txt' or die $!;
chomp( my @file2Lines = <$fh2> );
close $fh2;
for my $file1Line (@file1Lines) {
$matchingLines{"$file1Line\n"}++
if $file1Line ~~ @file2Lines;
}
open my $fh3, '>', 'FileA.txt' or die $!;
print $fh3 $_ for keys %matchingLines;
close $fh3;
If a line in @file1Lines is found in @file2Lines via the smart match operator (works as equality), it's added to the hash %matchingLines for later printing to a file (the hash is used to avoid the possibility of writing multiple instances of the same line to the file).
Hope this helps!
Update: Lotus1 correctly brought to my attention that I misunderstood the OP. Have revised the script. | [reply] [d/l] [select] |
|
|
File1.txt:
item1
item2
item3
item4
File2.txt:
item1
abc
item2
item3
item4
File3.txt:
item1
| [reply] [d/l] |
|
|
| [reply] |
Re: File handles in regular expressions
by Laurent_R (Canon) on Oct 21, 2012 at 13:02 UTC
|
First, three comments:
1. check the status of the open instructions;
2. chomp the lines you are reading to remove newline characters
3. use the eq operator instead of a regex, unless you have good reason to use regexes.
If the files are not too large (or, rather, if at least one of the files is not too large), read one of the files and store it in memory as a hash (using the full chomped line as the key). Once this is done, go through the other file and check if the line exists in the hash. If it exists, juste print it to your output file. This will be much faster than your nested foreach loops.
| [reply] |
Re: File handles in regular expressions
by locked_user sundialsvc4 (Abbot) on Oct 22, 2012 at 13:36 UTC
|
Of course, on a Unix/Linux system you can do this with the diff command, with appropriate options (that might be system specific).
I say this because this is an extremely common requirement and yet it is also very common to build one-off custom programs to satisfy such requirements. I say that without specific reference to this particular case or person. “I need to write a program to do this” is a conclusion that is quickly and easily jumped-to, especially when the prospect of doing so seems daunting. TMTOWTDI™, and sometimes TOWTDI isn’t Perl or a custom program at all.
| |
|
|
Yes, diff can be useful, and on Windows, you can use Winmerge, a public domain utility to compare files (there are most probably others). But these utilities require the files to be sorted in the same order, which might not be the case. And if you have to start sorting each file before comparing them, then a simple Perl one-liner might do the job faster.
At my work, we are using daily all kinds of combinations of Unix "power tools", including pipes and redirections to connect diff, sort, wc, cat, grep, find, cut, sed, awk, etc. commands, but Perl offers very often a better, simpler and faster way to do things.
And when I have to work on VMS or on Windows, where you don't have sed, cut or awk, Perl shows its superiority even more blatantly.
| [reply] |