Comparing fields in 1 database with fields in another

rline has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have two flatfile databases (let's call them file1 and file2), each with a number of lines. Say, file1 has 2 lines, and file 2 has 4 lines. Each line in each database has exactly the same delimited fields. I need to keep the 2 files separate, though.

I want to take the first line in file1, and then compare it with each of the 4 lines in file2, to see which fields are the same. So out of 10 possible fields, line 1 in file1 might have 6 matches with line 1 file2, 3 matches with line 2 file2, 7 matches with line 3 file2, and 1 match with line4 file2. I then need to be able to list the number of matches for each line of file 2 with line 1 file 1, and state what the matches are.

Once that's done, I want to repeat the above paragraph for the second line in file1. So, I need to treat each line in file 1 separately, but I need to compare it with every line in file 2.

The output might be:

Line 1 of File 1 has 6 matches with file2line1. These are...
Line 1 of File 1 has 3 matches with file2line2. These are...
etc etc etc

Line 2 of File 1 has 8 matches with file2line1. These are...
etc etc etc
[download]

Is this possible? Is it easy? I have been trying to do it using foreach loops, but my brain can't handle the logic at this stage.

Any ideas?

Thanks

edit - Petruchio Tue Oct 16 11:37:04 UTC 2001: Added markup.

Comment on Comparing fields in 1 database with fields in another Download Code

Replies are listed 'Best First'.
Re: Comparing fields in 1 database with fields in another by George_Sherston (Vicar) on Oct 16, 2001 at 15:50 UTC
If I were approaching this problem from a standing start, I would write out all the things I wanted to do in English, but break it up a bit so it reads clearly. So I might get something like: `for each line in File1, look at each line in File2 and for each of the fields in the current line if the field has the same stuff in it as the corresponding fie +ld in the current line of File 2 - remember it, otherwise - fuggedaboudit print out a report of all the stuff I remembered` [download] In practice this might be a mental rather than a manual operation. The advantage of writing it down, with all those suggestive indentations, wd be that one could then take each line and slowly transmogrify it into executable perl. To start with you might rewrite the whole lot, but with the looping operators and accompanying brackets etc written in: `foreach (line in File1) { look at each line in File2; foreach (field in the current line) { if (the field has the same stuff in it as the corresponding fi +eld in the current line of File 2) { remember it; } else { fuggedaboudit; } } } print out a report of all the stuff I remembered;` [download] This obviously still won't execute, but it's a framework that you can then colour in, as it were, by translating each of the non-perl bits into perl. So the first line might become `my @files = ('File1','File2'); my $file; foreach $file (@files) { do stuff with $file ...` [download] In fact you might boil that line down to `foreach $file (File1..File2) { do stuff with $file ...` [download] (which, because perl is wonderful, wd also work for more than two files). In fact you might boil it down further to `for (File1..File2) { do stuff with $_ ...` [download] Then you would go on to the next line of your original outline and re-write that too. And when you get it all translated into perl you should put a `-w` after the shebang line at the top, and on the next line write `use strict;` Then when you run it, either it works and you go on your way rejoicing, or it fails but it tells you why. If that's any help at all, I suggest you write some stuff, fix it up as much as you can and then, if it works, post it in triumph, and if it doesn't work, post it to see if anyone can tell you why. § George Sherston	[reply] [d/l] [select]
Re: Re: Comparing fields in 1 database with fields in another by rline (Initiate) on Oct 17, 2001 at 18:08 UTC
To everyone who replied to my post, thankyou. Particularly to George, many thanks, as it was your thoughts that enabled me to solve this one. I'm pretty chuffed with the code I did, even though for most of you this will be very basic and easy perl. Nevertheless, in case anyone can use it, here it is. Of course, if anyone wants to offer improvements on it, please feel free... #!/usr/bin/perl print "Content-type: text/html\n\n"; $file1 = "file1.txt"; $file2 = "file2.txt"; open(FILE1,"<$file1"); @file1 = <FILE1>; close(FILE1); open(FILE2,"<$file2"); @file2 = <FILE2>; close(FILE2); $matches = 0; foreach $line ( @file1 ) { foreach $line2 ( @file2 ) { @array1 = split (/\\|/, $line); @array2 = split (/\\|/, $line2); $numavail = @array1; $limit = $numavail - 3; for ( $i=12; $i < $limit; $i++ ) { if ( $array1[$i] eq $array2[$i] ) { $matches++; } } if ( $matches >= 1 && $array1[14] ge $array2[14] && $array1[12 +] le $array2[12]) { print "Here is where I print the data I want to retreive, using things + like $array1[12] etc"; } $matches = 0; } } [download]	[reply] [d/l]
Re: Comparing fields in 1 database with fields in another by Hofmator (Curate) on Oct 16, 2001 at 15:47 UTC
OK, this doesn't sound so difficult ... if possible I would read file2 completely into memory, already split up into a 2D array. So `$file2[0][1]`, e.g. would be the 2nd field in line 1 of file2. Having done that, proceed along the lines of the following (pseudo-)code: `# process file1 line by line while ($line = <FILE1>) { # split up the current line in file1 @fields = split /!/, $line; # for each line in the 2nd file foreach $lineref (@file2) { $number_of_matches = 0; # compare the two lines for $index (0..$number_of_fields) { $number_of_matches++ if ($lineref->[$index] eq $fields[$index]); } print "Matchinfo: ..."; } }` [download] -- Hofmator	[reply] [d/l] [select]
Re (tilly) 1: Comparing fields in 1 database with fields in another by tilly (Archbishop) on Oct 16, 2001 at 16:52 UTC
I find the spec unclear. What do you mean by matches? Do you mean a field in line 1 of file 1 happened to be the same as a field in line 2 of file 2? Does it matter if the match is between field 1 and field 5, or do the fields in question have to match up? Without being able to state a clear spec, it is hard to meet it. Anyways, some tips. However you solve the problem, it will probably help to have nested data structures. And for that it will help to read through References quick reference. To see if an actual data structure looks like you wanted it to look like, you can try Data::Dumper. Any problems of the form "This matches something in that list" are usually best handled in Perl by taking that list, and turning it into a hash, then doing a hash lookup. Take things one step at a time. If the full problem is too much for you, just find a piece you can handle. For instance figure out what internal data structure you want to use, and then inline some sample data in your script and try to work with that. Once you can get an answer from the parsed data, then worry about building a piece that reads the file into that structure...	[reply]
(Mappy Approach) Re: Comparing fields in 1 database with fields in another by Zaxo (Archbishop) on Oct 16, 2001 at 17:06 UTC
One way of approaching a problem of this kind is to find an effective way to represent the data. First, I'll assume your files are small enough to hold in memory. Putting files into an array of arrays: `my $ifs = '\|'; # or whatever open FH, "<$file1path" or die $1; my @file1data = map {[split $ifs, $_]} <FH>; close(FH( or die $!; open FH, "<$file2path" or die $1; my @file2data = map {[split $ifs, $_]} <FH>; close(FH) or die $!;` [download] Now we introduce a third array of arrays, for scores. It will have a row for each record in file1, and a column for each in file2. The number of matches will be stored in each: `my $lastrec = scalar( @{$file1data[0]}) - 1; my @scores = map { my $d1 = $_; # arrayref of a record from file11 [ map { # an arrayref with score for each record from + file2 my $d2 = $_; # arrayref of a record from file12 scalar grep { # count stringy matches $d1->[$_] eq $d2->[$_]; } 0..$lastrec; } @file2data ]; } @file1data;` [download] At this point you can scan a row of `@scores` for the indexes of best match, or whatever statistics you want. This could have been done as well in nested for loops, but map is fun to play with. Warning, it compiles, but it's untested code. After Compline, Zaxo	[reply] [d/l] [select]