Re: Compare 2 files and create a new one if it matches

Pure Perl solutions are no doubt best, but one could read the smaller file into an array, add appropriate markers, e.g. escaped pipe symbols at both ends of each element, eliminate duplicates, and write the resulting array to a new file, and use fgrep -f, capturing its output by using backticks (`).

Being a brute-force-and-ignorance sort of guy, my first pure Perl attempt would be to read both files into arrays, generate a really, really long regex from the search criteria (smaller) file, and use grep. Given my Perl-mojo, this would not work, and I'd have to loop through the individual records of the larger file.

Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Comment on Re: Compare 2 files and create a new one if it matches

Replies are listed 'Best First'.
Re^2: Compare 2 files and create a new one if it matches by ikegami (Patriarch) on Sep 22, 2008 at 20:15 UTC
It would require tons of memory. #!/usr/bin/perl use strict; use warnings; use Regexp::List qw( ); my $File1 = '...'; my $File2 = '...'; my $File3 = '...'; my $keep_re; { open(my $fh_keys, '<', $File1) or die("Can't open key file \"$File1\": $!\n); $keep_re = Regexp::List ->new() ->list2re( map { my $s = $_; chomp($s); $s } <$fh_keys> ); } { open(my $fh_in, '<', $File2) or die("Can't open input file \"$File2\": $!\n"); open(my fh_out, '>', $File3) or die("Can't create output file \"$File3\": $!\n"); print $fh_out grep /^[^\|]*\\|$keep_re\\|/, <$fh_in>; } [download]	[reply] [d/l]
Re^3: Compare 2 files and create a new one if it matches by swampyankee (Parson) on Sep 23, 2008 at 01:50 UTC
In addition to the raw storage of a large and not-so large file, I've no idea how much memory processing the regex would take. I am also a bit doubful of the likelihood of a regex with several thousand alternatives actually working. The Brute Force & Ignorance method does have its downsides. Information about American English usage here and here. Floating point issues? Please read this before posting. — emc	[reply]
Re^4: Compare 2 files and create a new one if it matches by ikegami (Patriarch) on Sep 23, 2008 at 02:05 UTC
I am also a bit doubful of the likelihood of a regex with several thousand alternatives actually working. I used List::Regexp, so there's at most {character set size} alternatives in each alteration.	[reply]