SQL Select

Scully4Ever has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm somewhat new to Perl and inexperienced with SQL. I've been using DBIx::Simple to pull information from various tables and it has worked great until I have an array of about 40,000 values that I need to match in @all_ids. I've tried a few other methods but run out of memory or it takes too long to complete. After I pull these values, I then need to get the number of unique parts. What is the most efficient way to query an sql table involving a large array?

my @data = $dbh->select(
    'table', '*', {
    ww    => [@ww],
    id => [@all_ids],
    }
)->hashes;

my $counter = 0;
for (my $i = 0; $i < @data; $i++) {
    my $part = $data[$i]{'part'};
    my $n_value = $data[$i]{'n'};
    if ($part !~ m/^0000000.*/) {
        if (!$seen{$part}) {
            $seen{$part} = 1;
            $counter++;
        }
    } elsif (!$seen{$n_value}) {
        $seen{$n_value} = 1;
        $counter++;
    }
}
print "The uin count is $counter\n";
}
[download]

Comment on SQL Select Download Code

Replies are listed 'Best First'.
Re: SQL Select by Mr. Muskrat (Canon) on Feb 03, 2016 at 22:40 UTC
When you use the select object method, DBIx::Simple uses SQL::Abstract to create the query and bind arguments. When SQL::Abstract sees a construct like `thing => [...]` in a where clause it builds SQL like 'thing = ? OR thing = ? ...' and places the values in the bind arguments (for everything in the arrayref). It may or may not help you but you could try `thing => { -in => [...] }` which tells it to build a more concise piece of SQL like 'thing IN (?,?...)' (but still places the values into the bind args). `my @data = $dbh->select( 'table', '*', { ww => { -in => [@ww] }, id => { -in => [@all_ids] }, } )->hashes;` [download] NetWallah has also raised some valid points about GROUP BY and COUNT.	[reply] [d/l] [select]
Re^2: SQL Select by Scully4Ever (Novice) on Feb 03, 2016 at 23:40 UTC
That did the trick. Thank you so much for the help! You guys are awesome!	[reply]
Re: SQL Select by NetWallah (Canon) on Feb 03, 2016 at 22:15 UTC
Can you do a "GROUP BY" in SQL, to get uniqueness ? Also, append to the "WHERE" clause to filter out parts you do not want to scan. If all you are doing is counting, SQL can again do that efficiently: `SELECT count(*) as nbr_of_parts_with_leading_zeros FROM table WHERE part like '0000000%';` [download] ...The words of the Prophets are written on the Facebook wall.	[reply] [d/l]
Re^2: SQL Select by Scully4Ever (Novice) on Feb 03, 2016 at 23:28 UTC
I have two groups of parts, one with valid part numbers and one that contains zeros. In the case with valid part numbers I need a unique count using the part numbers. In the case where they contain zeros I need to use the 'n' value and count how many unique 'n' values I have.	[reply]
Re: SQL Select by poj (Abbot) on Feb 03, 2016 at 20:35 UTC
Is `@all_id` created by another SQL query ? poj	[reply] [d/l]
Re^2: SQL Select by Scully4Ever (Novice) on Feb 03, 2016 at 23:24 UTC
Yes, @all_id is pulled from a different table. I have to use those values to find matching parts in the current table.	[reply]
Re^3: SQL Select by poj (Abbot) on Feb 04, 2016 at 14:14 UTC
If the ids are in another table (eg table_id) then you could use a join (inner) to filter the results. Here's an example using `SQL::Abstract::More` `#!perl use strict; use DBIx::Simple; use SQL::Abstract::More; my $dbh = '....'; # connect $dbh->abstract = SQL::Abstract::More->new(); my @ww = qw/a b c/; my @data = $dbh->select( -from => [ -join => qw/table id=id table_id/], -columns => ['part'], -where => { ww => {-in => [@ww] }, part => {-not_like => '0000000%'} }, -union => [ -columns => ['n'], -where => { ww => {-in => [@ww] }, part => {-like => '0000000%'} } ], -group_by => ['part'], )->flat; print 'The uin count is '.scalar @data."\n";` [download] poj	[reply] [d/l] [select]
Re^3: SQL Select by chacham (Prior) on Feb 04, 2016 at 15:23 UTC
Finding records in one table that exist in another is what IN() and EXISTS() are for. In general, if the column(s) are indexed, EXISTS() will be more efficient. `SELECT Id, col2, col3 FROM Data WHERE EXISTS ( SELECT * FROM All_Id WHERE Data.Id = All_Id.Id );` [download]	[reply] [d/l]