castaway has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
Not exactly sure what to call this one, its actually about comparing two arrays, or better, calling a sub and passing it pairs of values (one from each array), and only doing something if all the answers are the same. Hmm, thats not a good explanation either ;)

I'm maintaining some tables in a database that contain an ID field, and a varying number of other fields. I'm trying to write a sub which takes values for each of the fields as arguments compares them with whats already in the table, and either returns an ID (because they are already there) or inserts them and then returns the new ID.
So, I've got the contents of the table in an array of arrays (rows, then fields) and want to compare these using String::Approx, and only when all fields are similar to the array values, should it return the ID.

The rows-of-fields are in '$list', and the values in @vals:
if(@{$list}) { foreach $row (@{$list}) { my $Found = 1; for(my $i = 0; $i < @vals; $i++) { if($vals[$i] && $row->[$i+1] && String::Approx::amatch($row->[$i+1], $vals[$i])) { print "Found: " . $row->[0] . "\n"; $Found = $row->[0]; } else { $Found = 0; } } if($Found) { return $Found; } } }
- Doesnt work, as $Found has a value even when only one field/value pair matches, and I only need an ID when all fields in a row match.
jdporter suggested something like this, but I'd neglected to mention that I need the end result:
my $result = 1; for my $i ( 0 .. $#vals ) { $result &&= foo( $vals[$i], $arr[$i] ); } if ( $result ) { ..

(This started out as a sub which assumed there were exactly two values to compare, and just called String::Approx::amatch twice for each row, and somehow I can't get it to be generic.. Theres probably a simple solution.. )

C.

Replies are listed 'Best First'.
Re: Comparing array contents to DB contents
by bart (Canon) on Jan 30, 2003 at 10:37 UTC
    Invert the logic. You want it only to match if all test cases match? Then make it fail if one doesn't match. It's an application of one of the two De Morgan's laws — case (r) in the table in section 2.3.1.1, here. You can only get through to past the end of the inner loop, if none of the tests failed.

    The next code shows the idea, I've not actually run it (lack of data prevents that), but at least it compiles. It looks simple enough so it might work.

    ROW: foreach my $row (@$list) { for my $i (0 .. $#vals) { unless ($vals[$i] && $row->[$i+1] && String::Approx::amatch($row->[$i+1], $vals[$i])) { next ROW; } } # All succesful! return $row->[0]; } # Boohoo... no match. return;
      Works, super :) (I'll use a Label just this once... ;)

      C.

Re: Comparing array contents to DB contents
by Gilimanjaro (Hermit) on Jan 30, 2003 at 10:31 UTC
    Untested, but I think it should work:

    foreach $row (@{$list}) { for(my ($i=0, $matched=0; $i<@vals; $i++) { if ( $vals[$i] && $row->[$i+1] && String::Approx::amatch($row->[$i+1], $vals[$i]) ) { $matched++; } else { last; } } return ($matched==@vals) ? $row->[0]; } # none found; do the insert, get the new id and return it
      Or, slower (no short-circuit) but shorter:

      foreach $row (@{$list}) { my @matching = map { String::Approx::amatch($row->[$_+1],$vals[$_]) ? $_ : () } (0..$#vals); return (@matching==@vals) ? $row[0]; } # none found; do the insert, get the new id and return it

      (no check for undef columns here either...)

Re: Comparing array contents to DB contents
by Gilimanjaro (Hermit) on Jan 30, 2003 at 13:19 UTC
    By the way; I always prefer accessing my database rows using hashrefs for the records, to ensure I'm accessing the right fields... If you were to use this method, a partial hash comparison would suffise;

    my @keylist=qw(field1 field2 field3); for $row (@$list) { $id = hashcmp($row,$vals,@keylist) ? $row->{id} : insert_row($vals) ; } sub hashcmp { my ($a,$b,@keylist) = @_; for(@keylist) { next unless $a->{$_} || $b->{$_}; # both undef return 0 unless String::Approx::amatch($a->{$_},$b->{$_}); } return 1; }
    This assumes the named fields are always present in both the $vals and $row hashref...
Re: Comparing array contents to DB contents
by CountZero (Bishop) on Jan 30, 2003 at 12:00 UTC

    What would happen if on a per record basis you concatenate all fields in the Database and do the same for the fields in your table and then compare both strings against each other?

    I may have missed something, though. Why are you using the String::Approx module?

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

    Update: Changed "String::Compare" to "String::Approx"
      Hmm, might be an idea..
      I'm using it because I'm trying to clean up the data as I go.. theres quite a big possibility of similar names being written differently/mistyped (eg 'MudOS', 'MudOD', 'Mudos' are probably all the same thing.)

      Though I've not figured out yet how tolerant it is, this is just playing around at the mo..

      C.

        If it is just a matter of UPPERCASE v. lowercase, you can force both strings to lowercase and then do the comparison.

        Did you already have a look at Array::Compare? Incidentally, the simple compare method also uses "my" concatenation trick.

        Having had a quick peek at String::Approx I think it can be used with this concatenation trick: rather than test on eq you test on adist or adistr and see if both strings are sufficiently equal to be accepted as such or must be checked further.

        String::Similarity might also be a good candidate: it seems simpler than String::Approx.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Comparing array contents to DB contents
by castaway (Parson) on Jan 30, 2003 at 11:17 UTC
    Oops, I forgot to mention, when a field/value pair are both undef, thats also a match..

    C.