in reply to Compare Partial Lines of 2 Text Files

Parse the file containing the smaller number of lines and build a hash. Then parse the larger file and match lines using a hash lookup:

use strict; use warnings; my $file1 = <<FILE; applebananapearcarrotcarrotbeardeerdeer goatcowduckswanchickenmouseratbirdmouse chocolatedogdogfishmousecatdeerbird newyorkcalifornianewjerseymousecatdeerbird FILE my $file2 = <<FILE; monksbicyclewindbikecars computercomputerprinters hellicopterairplaneshelf chocolatedogdogfishmouse printerprintermousecouch FILE my %f1Lines; open IN, '<', \$file1; while (<IN>) { my ($key, $tail) = m/(.{24})(.*)/; push @{$f1Lines{$key}}, [$tail, $.]; } close IN; open IN, '<', \$file2; while (<IN>) { my ($key, $tail) = m/(.{24})(.*)/; next unless exists $f1Lines{$key}; my @matches = @{$f1Lines{$key}}; print "Line $. of file2 ($key$tail) matches:\n"; print " line $_->[1] of file1 ($key$_->[0])\n" for @matches; } close IN;

Prints:

Line 4 of file2 (chocolatedogdogfishmouse) matches: line 3 of file1 (chocolatedogdogfishmousecatdeerbird)

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: Compare Partial Lines of 2 Text Files
by Knoperl (Acolyte) on Jul 31, 2007 at 00:33 UTC
    Thank you very much Grandfather but I think I did not give a clear example of my output I wanted:
    File#1
    abcd efgh ijkl mnop
    File#2
    qq rr ij st mn jj rr
    Output I would want:
    kl op
    In this example I am saying character delimited by 2 characters. Meaning after the 2 characters is the part I want but I want to match it between the 2 files just for the first 2 characters.

    In the original question I wrote 24 characters which is what I do want but I realize that is confusing. Also I would like it to read external files and not have the data actually embedded inside the Perl program. I again appreciate any further assistance greatly either by you or other people who would love to join in the fun here at PerlMonks.com!

      Reread my sample and use that thing on your shoulders that prevents your hair falling down your throat and forming a hair ball. My sample isn't intended to be a complete answer to your problem. It is intended to show you some tools and an approach using those tools to solve your problem. It is also intended to be self contained so that you can easily reproduce the output I indicated that it generates. It should be pretty obvious how you plug in your own local files in place of the "internal files" used in the sample.

      The sample prints out more information that you asked for because that demonstrates how to store information (such as line number) for the data in the hash and how to access that ancillary information.

      Because you don't tell us the back story and don't provide context details such as "duplicate keys can/can't exist", the sample code assumes that not only duplicate keys may exist, but that their context is important. You may wish to consult perllol to gain some insight into how the hash of array (hoa) works if you've not encountered it before (or take a trip to the Tutorials section).


      DWIM is Perl's answer to Gödel