Re: Compare Partial Lines of 2 Text Files

Parse the file containing the smaller number of lines and build a hash. Then parse the larger file and match lines using a hash lookup:

use strict;
use warnings;

my $file1 = <<FILE;
applebananapearcarrotcarrotbeardeerdeer
goatcowduckswanchickenmouseratbirdmouse
chocolatedogdogfishmousecatdeerbird
newyorkcalifornianewjerseymousecatdeerbird            
FILE

my $file2 = <<FILE;
monksbicyclewindbikecars
computercomputerprinters
hellicopterairplaneshelf
chocolatedogdogfishmouse
printerprintermousecouch
FILE

my %f1Lines;

open IN, '<', \$file1;
while (<IN>) {
    my ($key, $tail) = m/(.{24})(.*)/;
    
    push @{$f1Lines{$key}}, [$tail, $.];
}
close IN;

open IN, '<', \$file2;
while (<IN>) {
    my ($key, $tail) = m/(.{24})(.*)/;
    
    next unless exists $f1Lines{$key};
    
    my @matches = @{$f1Lines{$key}};
    
    print "Line $. of file2 ($key$tail) matches:\n";
    print "   line $_->[1] of file1 ($key$_->[0])\n" for @matches;
}
close IN;
[download]

Prints:

Line 4 of file2 (chocolatedogdogfishmouse) matches:
   line 3 of file1 (chocolatedogdogfishmousecatdeerbird)
[download]

DWIM is Perl's answer to Gödel

Comment on Re: Compare Partial Lines of 2 Text Files Select or Download Code

Replies are listed 'Best First'.
Re^2: Compare Partial Lines of 2 Text Files by Knoperl (Acolyte) on Jul 31, 2007 at 00:33 UTC
Thank you very much Grandfather but I think I did not give a clear example of my output I wanted: File#1 `abcd efgh ijkl mnop` [download] File#2 `qq rr ij st mn jj rr` [download] Output I would want: `kl op` [download] In this example I am saying character delimited by 2 characters. Meaning after the 2 characters is the part I want but I want to match it between the 2 files just for the first 2 characters. In the original question I wrote 24 characters which is what I do want but I realize that is confusing. Also I would like it to read external files and not have the data actually embedded inside the Perl program. I again appreciate any further assistance greatly either by you or other people who would love to join in the fun here at PerlMonks.com!	[reply] [d/l] [select]
Re^3: Compare Partial Lines of 2 Text Files by GrandFather (Saint) on Jul 31, 2007 at 01:14 UTC
Reread my sample and use that thing on your shoulders that prevents your hair falling down your throat and forming a hair ball. My sample isn't intended to be a complete answer to your problem. It is intended to show you some tools and an approach using those tools to solve your problem. It is also intended to be self contained so that you can easily reproduce the output I indicated that it generates. It should be pretty obvious how you plug in your own local files in place of the "internal files" used in the sample. The sample prints out more information that you asked for because that demonstrates how to store information (such as line number) for the data in the hash and how to access that ancillary information. Because you don't tell us the back story and don't provide context details such as "duplicate keys can/can't exist", the sample code assumes that not only duplicate keys may exist, but that their context is important. You may wish to consult perllol to gain some insight into how the hash of array (hoa) works if you've not encountered it before (or take a trip to the Tutorials section). DWIM is Perl's answer to Gödel	[reply]