Hello Monks,

I am trying to compare the records from two sets of data, for differences. I have several fields in the table, one of which is called data_set. In this field, the name of the set of data is noted, which identifies the source of the data. The same data_set value can be present in a number of records in the table.

Firstly, I want to be able to retrieve every record associated with set1 and put them into an array. I want to do the same for set2. (if this is the best way to do it)
I then want to be able to compare each record (whole, not individual fields) in set1_array with every record in set2_array and print the record if the record does not appear in set2. The same process should be done to compare set2 to set1. During these processes the data_set field should be ignored as it will obviously differ.

like this...

TABLE data_set | field1 | field2 | field3 ------------------------------------ set1 | yes | 0 | HELLO set1 | yes | 1 | HELLO set1 | no | 1 | HI set2 | yes | 0 | HELLO set2 | yes | 0 | HELLO set2 | no | 0 | HEY set2 | no | 1 | HELLO set3 | no | 1 | HALO SET1 field1 | field2 | field3 ------------------------- yes | 0 | HELLO yes | 1 | HELLO no | 1 | HI SET2 field1 | field2 | field3 ------------------------- yes | 0 | HELLO yes | 0 | HELLO no | 0 | HEY no | 1 | HELLO

So here, compare (set1, row1) to (set2, row1), (set2, row2), (set2, row3) and (set2, row4), then compare the next row in set1 to every row in set2 and so on...
then go through the exact same process but comparing set2 to set1.
Eventually (set1, row2), (set1, row3), (set2, row3) and (set2, row4) will be printed.

Heres my code:

my ($data_set_01, $data_set_02, $format, @data_cat) = @_; my ($dbh, $sth, $sql_01, $sql_02); my ($row1, $row2); my @ds1_rows = (); my @ds2_rows = (); # Process each table foreach (@data_cat) { #------------------ # Data Set 1 #------------------ # Create the query statment $sql_01 = "SELECT * FROM " . $_; # Insert the where clause to the query. # This narrows the data chosen to the data_set the user is + currently looking at. insert_where_clause(\$sql_01, $where_clause_ds1); # Connect to the database. if (! open_database_connection ( \$dbh, $program_config{database}, $program_config{admin_login}, $program_config{admin_pass} ) ) { # Indicate that a problem occured. return 0; } # Execute the query if (! execute_query(\$dbh, \$sth, $sql_01)) { return 0; } # get all records in table for the data set while ($row1 = $sth->fetchrow_array) { push @ds1_rows, $row1; } # Close the database connection. if (! close_database_connection (\$dbh)) { # Indicate that a problem occured. return 0; } #------------------ # Data Set 2 #------------------ # Create the query statment $sql_02 = "SELECT * FROM " . $_; # Insert the where clause to the query. # This narrows the data chosen to the data_set the user is + currently looking at. insert_where_clause(\$sql_02, $where_clause_ds2); # Connect to the database. if (! open_database_connection ( \$dbh, $program_config{database}, $program_config{admin_login}, $program_config{admin_pass} ) ) { # Indicate that a problem occured. return 0; } # Execute the query if (! execute_query(\$dbh, \$sth, $sql_02)) { return 0; } # get all records in table for the data set while ($row2 = $sth->fetchrow_array) { push @ds2_rows, $row2; } print $ds2_rows[0]."\n"; # Close the database connection. if (! close_database_connection (\$dbh)) { # Indicate that a problem occured. return 0; } #------------------------------------ # Compare rows & output differences #------------------------------------ # compare each row in the first set of data to every row # in the second set of data. If it does not occur, then # they are different, so print. foreach (@ds1_rows) { my $record1 = $_; foreach (@ds2_rows) { my $record2 = $_; # if the record cannot be found if ($record1 ne $record2) { print $record1."\n".$record2."\n\n"; } } } # empty arrays for next table @ds1_rows = (); @ds2_rows = (); }
Originally I thought $sth->fetchrow_array would simply retrieve the whole record (row) and i could work from there and do a couple of nested foreach loops and compare the rows. Then repeat the process for the other way round. It seems to only retrieve the last field value?

Sorry for the very long question but I thought i'd better explain it properly to make sure you guys understood what i meant! Also please note, the db functions used here definately work. Hope there is someone who can point me in the right direction!

thanks, Jason


In reply to Compare records from different data sets by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.