Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Exulted ones,
Put simply, why doesn't this work!?
# compare each row in the first set of data to every row # in the second set of data. If it does not occur, then # they are different, so print. for my $a ( @ds1_rows ) { my $record1 = "@$a"; my $bool = ''; for $aref ( @ds2_rows ) { my $record2 = "@$a"; $found = $record1 eq $record2; $bool = $found; # set bool vari +able to value of found last if $found; } # if the boolean variable is equal to nothing, the for + loop was # not exited prematurely, thus no record was found mat +ching. if ($bool eq '') { $record_count++; } } # assign field names to values for each record that # is found to be different for (my $x = 0; $x < $record_count; $x++) { for (my $y = 0; $y < $field_count; $y++) { print $column_names[$y] . ": " . $ds1_rows[$x][$y] + . "\n" unless $found; } }
I've noticed there has been a number of articles quite recently that provide help related to what I am doing, but I can't find an answer to solve this problem.

The code above looks at database records stored in arrays (hence array of arrays) and compares one record at a time to all records in a second set to see if there's a match. If there isn't, then it's a unique record and the values of that record must be printed out with field names as a labels.
@ds1_rows and @ds2_rows are arrays of records of values. The number of record values in each record in each set is the same, however the number of records can vary. @column_names is a list of field names that match the number of values.

At the moment I think I can see if a record matches. I can also record the number of non-matches. I can print the column name as a label next to record values, however the records don't represent those that do not exist but are the records first in the array.

Please excuse the fact my code isn't as concise as it probably can be, i'm fairly new to programming and Perl so i've only used what I know and understand.

Cheers for any help that is given, Steve.

  • Comment on Printing the values of unique database records from comparing arrays of records
  • Download Code

Replies are listed 'Best First'.
Re: Printing the values of unique database records from comparing arrays of records
by punkish (Priest) on Mar 12, 2005 at 05:29 UTC
    The way I understand your problem, you have several errors in your code...
    # if @ds1_rows is an AofA holding a dataset, then... for my $a ( @ds1_rows ) { # $record1 is an array, not a scalar... you are deref-ing # an arrayref and assigning it to a scalar, which doesn't # make any sense my $record1 = "@$a"; my $bool = ''; # then, further on... # you are assigning each element to $aref # (where is the 'my'?)... for $aref ( @ds2_rows ) { # and then, not using $aref... # instead, you are using $a again my $record2 = "@$a";

    That said, lets ponder over what you mean by "one record at a time to all records in a second set to see if there's a match." Do you mean to compare an entire record, by which you seem to imply an entire row, with each entire row in the second set? I mean, if each row has 10 columns, do all the 10 columns have to be identical to all the 10 columns in another row for the two to be identical? That is a bit confusing.

    In any case, you want to compare two arrays (well, AofAs in this case, but still arrays nonetheless) and compute their intersection. From the Cookbook, you get

    #Simple solution for union and intersection foreach $e (@a) { $union{$e} = 1 } foreach $e (@b) { if ( $union{$e} ) { $isect{$e} = 1 } $union{$e} = 1; } @union = keys %union; @isect = keys %isect;
    Apply the above logic. Or, loop over one array, and for each element, grep through the other array. Check out the usage of grep to search for an value in an array.

    Hope all this helps.

    --

    when small people start casting long shadows, it is time to go to bed
      Yes, I want to compare the whole row. The reason being, each record contains configuration data. While some values may be the same as in another record, others can be different. e.g. ("cpu","intel") and ("cpu","amd"), the cpu values are the same but the processor manufacturers are different. So this is why I want to compare the whole row and output all the record values with labels e.g. Hardware, Manufacturer.
Re: Printing the values of unique database records from comparing arrays of records
by Popcorn Dave (Abbot) on Mar 12, 2005 at 03:29 UTC
    Have you tried using Data::Dumper to see what you're actually pulling out?

    In addition, you might want to look at ptkdb - the Perl Graphical debugger so that you can see what your code is doing step by step.

    Those are usually the first things I do when I have code that is working the way it wants, not the way I expected it to.

    Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.
Re: Printing the values of unique database records from comparing arrays of records
by TedPride (Priest) on Mar 12, 2005 at 08:52 UTC
    Seems to me that this is an inefficient way to do this. Wouldn't it be better to make a hash with a MD5 key for each record in the second set, then for each record in the first set, create a MD5 key and check if the key exists in the second set hash? Given a large number of records to match (not to mention a large number of fields in each record), this should speed thing up signficantly, and since you only have to store the hashes in memory, it will also be much easier on memory usage - if you read each record one at a time as you hash it.

    Code to follow...

    use strict; use warnings; use Digest::MD5 qw(md5 md5_hex md5_base64); my (@data, %data, $key, @record); # Create nested array of data for testing purposes. for (<DATA>) { chomp; push @data, [split / /]; } # Join each record and create hash key from contents. # Note: You have to include field separators (in this # case tabs), or you could end up with a situation # where non-identical records match. for (@data) { $key = md5 join "\t", @$_; $data{$key} = 1; } # Now you can check any record you want by creating # a key and seeing if it exists in the hash. @record = qw/aa aa aa aa aa aa aa aa aa/; $key = md5 join "\t", @record; print join " ", @record if !$data{$key}; @record = qw/tt ii mm ee tt hh ee rr ee/; $key = md5 join "\t", @record; print join " ", @record if !$data{$key}; # You'll still need to match up the field names, # and you will of course be looping through the # second set of records instead of doing one at a # time, but this should serve as an example of # how to use hashes to drastically cut down on # the number of comparisons. __DATA__ oo nn cc ee uu pp oo nn aa tt ii mm ee tt hh ee rr ee ww aa ss aa gg oo bb ll ii