One way of approaching a problem of this kind is to find an effective way to represent the data. First, I'll assume your files are small enough to hold in memory. Putting files into an array of arrays:
Now we introduce a third array of arrays, for scores. It will have a row for each record in file1, and a column for each in file2. The number of matches will be stored in each:my $ifs = '|'; # or whatever open FH, "<$file1path" or die $1; my @file1data = map {[split $ifs, $_]} <FH>; close(FH( or die $!; open FH, "<$file2path" or die $1; my @file2data = map {[split $ifs, $_]} <FH>; close(FH) or die $!;
At this point you can scan a row of @scores for the indexes of best match, or whatever statistics you want. This could have been done as well in nested for loops, but map is fun to play with. Warning, it compiles, but it's untested code.my $lastrec = scalar( @{$file1data[0]}) - 1; my @scores = map { my $d1 = $_; # arrayref of a record from file11 [ map { # an arrayref with score for each record from + file2 my $d2 = $_; # arrayref of a record from file12 scalar grep { # count stringy matches $d1->[$_] eq $d2->[$_]; } 0..$lastrec; } @file2data ]; } @file1data;
After Compline,
Zaxo
In reply to (Mappy Approach) Re: Comparing fields in 1 database with fields in another
by Zaxo
in thread Comparing fields in 1 database with fields in another
by rline
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |