comment on

NOTE: I asked this question before signing up a bit ago and now I can't find it, heh. So here it is again from my brand spankin new perl monks account :) Sorry it's long.

Heya, monks :)

I'm running Perl 5.8.7 on Windows Server 2003 and connecting to SQL Server 2005 via DBI.

I have been rebuilding a perl report and came across an interesting discrepancy while approaching some calculations from two angles. To give some context, this is for a system that keeps track of agencies that scan products. I need to find the number of unique products scanned by each user for an agency within a certain time frame, by querying the scan events table.

At this point, I already know which users belong to which agencies, but I am trying to calculate them all together, rather than one agency at a time like the old report.

The old report used the following query over and over for each agency:

select
    username,
    count(distinct product)
from events
where username in $user_string
and gmtdatetime >= '$min_date'
and gmtdatetime < '$max_date'
group by username;
[download]

and I receive a result. However, running this query over and over for all users in all agencies takes a bad toll on sql server. So in order to get system wide calculations, I opted to pull the records without grouping or counting, and sort out the unique ones directly in perl, which seems to take much less time. Unfortunately, the numbers calculated in this way are much higher than querying for the counts directly, and I don't know why. I use the following query:

select
    username,
    product,
from events
where username in $user_string
and gmtdatetime >= '$min_date'
and gmtdatetime < '$max_date'
[download]

and then I create two hashes, one that holds the unique products per agency, and one that holds the unique products per user, using the product id as the hash key in both instances. I do a check to see if the product exists in the agency hash and if it does not, I add it to the agency hash and also to the user hash. At the end, I count the keys on each element in the user hash to get the unique products scanned for that user:

while(my @row = $res->fetchrow_array()) {

    my ($username, $product) = @row;
    
    # mark product as used
    if(!exists($agency_products{$user_agency{$username}}{$product})) {
        $agency_products{$user_agency{$username}}{$product} = 1;
        $user_products{$username}{$product} = 1;
    }
    
}
[download]

Then I use the following to get the count for each user:

foreach my $i_user (keys(%stats)) {
    $stats{$i_user}{'UNIQUE_PRODUCTS'} = keys(%{$user_products{$i_user
+}});
}
[download]

But this gives me a higher result than the sql only method. Does anyone have any inkling as to why this might be?

Thanks in advance for your help!

Ryan

In reply to Perl/SQL Server Count Discrepancy by muertetg

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.