Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there,
I am trying to compare the two values here, the code works but my problem is that the files are diferent in size and the code compares it side by side, therefore the matchings are out of order, how can I ignore the order where the items in both files are and still match them?
while(<DB>){ my $rec = $_; chomp($rec); ($num,$_name)=split(/,/,$rec); $num=~s/(\d)(\d\d\d\d\d)/$2/g; print "First mach $num<br>"; if($num=~/0\d{4}/g){ $num=~s/(\d)(\d\d\d\d)/$2/g; print "Second mach = $num_pal<br>"; } # exit; my $pal = <PROCPAL>; chomp($pal); ($pal_num,$appr,$e_mail,$terr)=split(/\#/,$pal);

Thanks

Replies are listed 'Best First'.
Re: Text Files Matching Items
by matija (Priest) on Mar 24, 2004 at 16:53 UTC
    That really, really depends on what is in the files, and how they are organized. Are the lines in the files arranged by alphabet, by any chance? If so, you can skip lines from one or the other file (depending on which has the smaller key) until you find one that matches.

    If the records are in no particular order, then you have two options:

    • Sort both files, and see above
    • Read the smaller of the two files into memory, convert the keys into a hash, and then for each line of the larger file, check the contents of the hash.
    If all else fails, you could look through Algorithm::Diff to see if you get any ideas.
Re: Text Files Matching Items
by Happy-the-monk (Canon) on Mar 24, 2004 at 16:53 UTC

    if at least one of the files isn't extremely big,
    read the values to be looked up in a hash, then do a hash lookup on those values.

Re: Text Files Matching Items
by McMahon (Chaplain) on Mar 24, 2004 at 17:25 UTC
    I had a similiar problem just last week. matija recommends Algorithm::Diff, but it didn't do the Right Thing for me.
    matija and Happy-the-monk recommend putting the values into a hash, which is excellent advice. The Perl Cookbook shows how, recipes 4.7 and 4.8.
    Or (ta-da!) use List::Compare (http://search.cpan.org/~jkeenan/List-Compare-0.22/Compare.pm) which implements the hash-y code in a really, really usable way. I think you'll find that it will solve your problem in an elegant way. (It did mine!)
    -Chris
Re: Text Files Matching Items
by graff (Chancellor) on Mar 25, 2004 at 03:12 UTC
    <nitpick> When you post code, you should (a) use logical and consistent indentation, rather than random indentation, and (b) post enough of it so that it's syntactically well formed (unless your post involves a question like "why do I get this specific syntax error on the following code?"). Not only is your indentation very misleading, your code is missing some amount of logic that is central to your problem, and is also missing a closing curly bracket. </nitpick>

    If I understand your question, you have lines of data being read from two distinct inputs, and some of these lines are expected to contain values that are common to both inputs. Looking at the code and comments you've posted, there's not much else that can be known for sure.

    Happy-the-monk's reply probably points to the best approach. Read all the lines from one input first, storing the presumed common strings as keys in a hash. If you need the whole line (or some other portion of the line) later on when reporting matches, save that as the value of the hash for the given key.

    After you've read all of the first input, start reading the second one; for each input line, isolate the string (if any) that should match the first file and see if a hash element exists with that string as the hash key (see "perldoc -f exists"). If so, you now have the full current line from the second input file, and you have whatever portion of the matching line you needed from the first file (if any). Print the match, and continue reading from the second file.