Perl is ideal for comparing strings. That being said, I would either do the comparisons on strings, or I'd do all of the work in the database, if I wanted to deal with it as records.

We assume we have a function get_records( $sql ) that returns a single scalar for each record, given the sql string, you'd use something like the following general logic: (you may have to adjust to deal with your database's concatenation operator, and select a deliminator that doesn't occur in your fields)

my $fields = join "||'|'||", qw( field1, field2, field3); my %set1 = map { $_ => 1 } get_records ( "SELECT $fields FROM table WHERE data_set = 'set1'"); my %set2 = map { $_ => 1 } get_records ( "SELECT $fields FROM table WHERE data_set = 'set2'"); my @set2not1 = grep { ! exists($set1{$_}) } keys %set2; my @set1not2 = grep { ! exists($set2{$_}) } keys %set1;

You can then split the records back out into their individual fields. But there's no reason to go and load all of the records into memory, when you already have them in the database, which is made specifically to handle dealing with records. If you're using a database that can take subqueries, you can do the relationship in one statement. If you're not, you need some temporary tables or views --

CREATE VIEW set1 AS SELECT field1,field2,field3 FROM table WHERE data_set = 'set1'; CREATE VIEW set2 AS SELECT field1,field2,field3 FROM table WHERE data_set = 'set2';

You can then use a non-equijoin in Oracle, or a right join in mySQL, or whatever you have in your particular database to get a list of records that don't have a correlation.

Oh -- and you might want to look over the documentation for DBI, because fetchrow_array() returns a list for each element, which you're then evaluating in scalar context, and would get the count of the fields for each item. You probably wanted fetchrow_arrayref()


In reply to Re: Compare records from different data sets by jhourcle
in thread Compare records from different data sets by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.