Scully4Ever has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm somewhat new to Perl and inexperienced with SQL. I've been using DBIx::Simple to pull information from various tables and it has worked great until I have an array of about 40,000 values that I need to match in @all_ids. I've tried a few other methods but run out of memory or it takes too long to complete. After I pull these values, I then need to get the number of unique parts. What is the most efficient way to query an sql table involving a large array?

my @data = $dbh->select( 'table', '*', { ww => [@ww], id => [@all_ids], } )->hashes; my $counter = 0; for (my $i = 0; $i < @data; $i++) { my $part = $data[$i]{'part'}; my $n_value = $data[$i]{'n'}; if ($part !~ m/^0000000.*/) { if (!$seen{$part}) { $seen{$part} = 1; $counter++; } } elsif (!$seen{$n_value}) { $seen{$n_value} = 1; $counter++; } } print "The uin count is $counter\n"; }

Replies are listed 'Best First'.
Re: SQL Select
by Mr. Muskrat (Canon) on Feb 03, 2016 at 22:40 UTC

    When you use the select object method, DBIx::Simple uses SQL::Abstract to create the query and bind arguments.

    When SQL::Abstract sees a construct like thing => [...] in a where clause it builds SQL like 'thing = ? OR thing = ? ...' and places the values in the bind arguments (for everything in the arrayref).

    It may or may not help you but you could try thing => { -in => [...] } which tells it to build a more concise piece of SQL like 'thing IN (?,?...)' (but still places the values into the bind args).

    my @data = $dbh->select( 'table', '*', { ww => { -in => [@ww] }, id => { -in => [@all_ids] }, } )->hashes;

    NetWallah has also raised some valid points about GROUP BY and COUNT.

      That did the trick. Thank you so much for the help! You guys are awesome!

Re: SQL Select
by NetWallah (Canon) on Feb 03, 2016 at 22:15 UTC
    Can you do a "GROUP BY" in SQL, to get uniqueness ?

    Also, append to the "WHERE" clause to filter out parts you do not want to scan.

    If all you are doing is counting, SQL can again do that efficiently:

    SELECT count(*) as nbr_of_parts_with_leading_zeros FROM table WHERE part like '0000000%';

            ...The words of the Prophets are written on the Facebook wall.

      I have two groups of parts, one with valid part numbers and one that contains zeros. In the case with valid part numbers I need a unique count using the part numbers. In the case where they contain zeros I need to use the 'n' value and count how many unique 'n' values I have.

Re: SQL Select
by poj (Abbot) on Feb 03, 2016 at 20:35 UTC

    Is @all_id created by another SQL query ?

    poj

      Yes, @all_id is pulled from a different table. I have to use those values to find matching parts in the current table.

        If the ids are in another table (eg table_id) then you could use a join (inner) to filter the results. Here's an example using SQL::Abstract::More

        #!perl use strict; use DBIx::Simple; use SQL::Abstract::More; my $dbh = '....'; # connect $dbh->abstract = SQL::Abstract::More->new(); my @ww = qw/a b c/; my @data = $dbh->select( -from => [ -join => qw/table id=id table_id/], -columns => ['part'], -where => { ww => {-in => [@ww] }, part => {-not_like => '0000000%'} }, -union => [ -columns => ['n'], -where => { ww => {-in => [@ww] }, part => {-like => '0000000%'} } ], -group_by => ['part'], )->flat; print 'The uin count is '.scalar @data."\n";
        poj

        Finding records in one table that exist in another is what IN() and EXISTS() are for. In general, if the column(s) are indexed, EXISTS() will be more efficient.

        SELECT Id, col2, col3 FROM Data WHERE EXISTS ( SELECT * FROM All_Id WHERE Data.Id = All_Id.Id );