Re: Slow response on DBI MySQL query
by Corion (Patriarch) on Mar 24, 2015 at 13:31 UTC
|
There are many things you can cache to make executing your subroutine faster.
For example, you can cache the statement handle $obtain_country to fetch a single country:
use vars '$obtain_country';
sub Obtain_GeoIP {
...
$obtain_country ||= $obtain_country = $dba->prepare("select Countr
+y from GeoIP where $lip between Start_int and End_int limit 1");
};
You could also restructure the query and instead of issuing a query for each IP address, insert all IP addresses into a table and then query the countries for them through SQL.
I'm not sure if a table is the best data structure for a range query and personally, I would just use the Geo::IP module. | [reply] [d/l] [select] |
|
|
Corion,
Thank you for the response! I am unable to do a single query as there is a lot more processing being done on the rest of the script. Also, I looked at Geo::IP, but the site where the database needs to be downloaded seems down. Are you using it? Any pointers into that module?
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
|
|
| [reply] |
|
|
|
|
unable to do a single query as there is a lot more processing being done
One does not preclude the other. Retrieving multiple responses in one batch allows the script to work on all of them as well. If the DB needs to do even more work, perhaps the RDBMS should share more of the load, for example, an SP or more intelligent query be written.
| [reply] |
Re: Slow response on DBI MySQL query
by chacham (Prior) on Mar 24, 2015 at 13:51 UTC
|
"select Country from GeoIP where ('$lip' > Start_int) and ('$lip' < End_int) limit 1");
Aside from adding insecurity with dynamic SQL, there's little benefit from possible RDBMS caching. That is, preparing a query that uses palceholders allows the RDBMS to create a PLAN that (it thinks) is best for varying arguments. Also, rerunning the query does not require a new prepare.
And, large loads can be done with a bulk collect.
'$lip' > Start_int) and ('$lip' < End_int)
BETWEEN is inclusive. So a BETWEEN b and c means a >= b AND a <= c. So, there's a difference between (no pun intended) the two statements.
| [reply] |
|
|
Great point in the between query!
It should be a >= b AND a <= c and not a>b and a<c
Sorry for the lack of knowledge, but when you say that it does not require a new prepare, how would that work?
How would I need to change the query so that $lip is updated?
Again, sorry for my lack of knowledge.
| [reply] |
|
|
sorry for my lack of knowledge.
No reason to be sorry about not knowing. Ignorance is not an excuse (to receive pardon) though it is an excellent reason to ask a question!
Change $lip to a question mark: WHERE ? BETWEEN ... Then, when execute()ing the query, pass the parameter. If in a loop to run multiple times, for example, the prepare would exist before the loop, the execute (and fetch) within it.
| [reply] |
|
|
|
|
Re: Slow response on DBI MySQL query
by fishmonger (Chaplain) on Mar 24, 2015 at 14:37 UTC
|
Besides the Gep::IP module, there is also the IP::Country module for "fast lookup of country codes from IP addresses".
I have not used either of them, but they both will do what you need.
I would also use the is_ipv4() and is_private_ipv4() functions from the Data::Validate::IP module to handle the validation of the IP.
Also, instead of manually converting the IP to an int, I'd use the built-in mysql function inet_aton() in the query.
| [reply] |
|
|
Fishmonger, lots of really good recommendations! It will take me a little bit to change the script with all of them, and then I will come back to post the speed improvement of the recommendations. Thank you very much!
| [reply] |
|
|
Monks, after doing all recommended changes I was able to process 3,000 records per second (for a total of 259,200,000 records a day).
All changes had some positive effect, but the one that had the most dramatic effect was moving to Geo::IP. As it was mentioned on the thread by a few of you guys, the relational database was certainly not the correct place to do this operation.
The finalized code looks like this now:
sub Obtain_GeoIP {
$GeoIP_resolved++;
my ($IP) = @_;
if($IP=~/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/ &&(($1<
+=255 && $2<=255 && $3<=255 &&$4<=255 )))
{
my @sip = split(/\./,$IP);
if (($sip[0] == "10") || ($sip[0] == "192" && $sip[1] == "168"
+) || (($sip[0] == "172" && $sip[1] >= "16") && ($sip[0] == "172" && $
+sip[1] <= "31")))
{
return "Internal Network";
} else {
if ( exists $geoip_hash{IP} ) { my $country_hash = $ge
+oip_hash{IP}; return $country_hash; }
my $country = $gi->country_name_by_addr($IP);
$geoip_hash{$IP} = $country;
return $country;
}
} else {
print EFILE "IP address sent to Obtain_GeoIP routine w
+as incorrect: $IP - $counter\n";
}
}
Thank you very much for everybody's help! | [reply] [d/l] |