Continuing my quest to be familiar with what coding methods result in the most efficient code, I couldn't resist having a crack at speeding your GetMinimalDataType routine up a little. The results are:
start 18.840ms average per iteration end 2.222ms average per iteration 88% less time or 847% improvement?
Which way you view the improvement depends on whether your a scientist or a salesman I guess :^).
Note: I realise that the level of improvement is highly dependant upon the data used, and I'd love to see you profile data post any changes you choose to adopt.
Step by step log of changes and the improvements derived
As provided. NOTE: Numerous "Use of uninitialized value in..." lines +28, 32, 54, 57 25 trials of GMDT (471.000ms total), 18.840ms/trial Editing lines 27,28, 29 giving $szIntegerPart = ($2||''); $szDecimalPart = ($4||'') . ($6||''); $szFloatPart = $7||''; 25 trials of GMDT (250.000ms total), 10.000ms/trial Converted if (condition) { return 'constant'; } to return 'c +onstant' if condition; 25 trials of GMDT (200.000ms total), 8.000ms/trial Changed if (not defined $varValue) { return 'NULL' } to + return 'NULL' unless $varValue; 25 trials of GMDT (151ms total), 6.040ms/trial Chnaged $iLength = length $varValue; return( 'CHAR(' . $iLength . ')' ); to return 'CHAR(' . length($varValue) . ')'; and removed redundant variable; 25 trials of GMDT (100.000ms total), 4ms/trial Changed my $varValue = shift(@_)]; my ($iScale, $iPrecision my ($szIntegerPart, $szDecimalPart, $szFloatPart); to my $varValue = $_[0]; my ($iScale, $iPrecision, $szIntegerPart, $szDecimalPart, $szFloat +Part); 25 trials of GMDT (121.000ms total), 4.840ms/trial Changed constant numeric expressions to constant subs. use constant SHORT_MIN => -32768; use constant SHORT_MAX => 32767; use constant LONG_MIN => -2147483648; use constant LONG_MAX => 2147483647; use constant FLOAT_MIN => 1.175 * 10 ** -38; use constant FLOAT_MAX => 3.402 * 10 ** 38; use constant DOUBLE_MIN => 1.175 * 10 ** -308; use constant DOUBLE_MAX => 3.402 * 10 ** 308; 25 trials of GMDT (90ms total), 3.600ms/trial Changed # 1 2 3 4 5 6 7 if ($varValue =~ /^ [+-]? ( (\d+) (\.(\d*))? | (\.(\d+)) ) ([Ee][+ +-]?\d+)? $/ox ) { $szIntegerPart = ($2||''); $szDecimalPart = ($4||'') . ($6||''); $szFloatPart = ($7||''); to if ($varValue =~ /^ [+-]? (?: (\d+) (?:\.(\d*))? | (?:\.(\d+)) ) ( +[Ee][+-]?\d+)? $/ox ) { #Check for numeric types $szIntegerPart = ($1||''); $szDecimalPart = ($2||'') . ($3||''); $szFloatPart = ($4||''); No discernable difference over 25 trials - increased number of trials +from 25 to 500 500 trials of GMDT (1.455s total), 2.910ms/trial Moved return from string to top of sub, return 'CHAR(' . length($varValue) . ')' unless $varValue =~ /^ [+-]? (?: (\d+) (?:\.(\d*))? | (?:\.(\d ++)) ) ([Ee][+-]?\d+)? $/ox ; 500 trials of GMDT (1.201s total), 2.402ms/trial Made the regex a constant sub. Difference to small to be sure of any a +dvantage one way or the other. use constant RE_NUMERIC => qr/^ [+-]? (?: (\d+) (?:\.(\d*))? | (?:\ +.(\d+)) ) ([Ee][+-]?\d+)? $/x; ... unless $varValue =~ RE_NUMERIC; 500 trials of GMDT (1.111s total), 2.222ms/trial
The final cut of your test program with the modifications. Sorry, but some of your comments got 'lost' in the transitions.
#! perl -slw use Benchmark::Timer; use strict; use constant SHORT_MIN => -32768; use constant SHORT_MAX => 32767; use constant LONG_MIN => -2147483648; use constant LONG_MAX => 2147483647; use constant FLOAT_MIN => 1.175 * 10 ** -38; use constant FLOAT_MAX => 3.402 * 10 ** 38; use constant DOUBLE_MIN => 1.175 * 10 ** -308; use constant DOUBLE_MAX => 3.402 * 10 ** 308; use constant RE_NUMERIC => qr/^ [+-]? (?: (\d+) (?:\.(\d*))? | (?:\ +.(\d+)) ) ([Ee][+-]?\d+)? $/x; my $t = new Benchmark::Timer; while( <DATA> ) { $t->start('GMDT'); print GetMinimalDataType($_); $t->stop('GMDT'); }; $t->report; exit; sub GetMinimalDataType { my $varValue = $_[0]; my ($iScale, $iPrecision, $szIntegerPart, $szDecimalPart, $szFloat +Part); return 'NULL' unless $varValue; return 'CHAR(' . length($varValue) . ')' unless $varValue =~ RE_NUMERIC; #Check for numeric types $szIntegerPart = ($1||''); $szDecimalPart = ($2||'') . ($3||''); $szFloatPart = ($4||''); #Is this a simple integer? if ($szIntegerPart ne '' and $szDecimalPart eq '' and $szFloatPart + eq '' ) { #SMALLINT 16 bits ?32,768 to 32,767 Signed short (word) return 'SMALLINT' if $varValue >= SHORT_MIN and $varValue <= S +HORT_MAX; #INTEGER 32 bits ?2,147,483,648 to 2,147,483,647 Signed long ( +longword) return 'INTEGER' if $varValue >= LONG_MIN and $varValue <= LON +G_MAX; } #Is this a simple decimal? if ( not $szFloatPart) { $iScale = length $szDecimalPart; $iPrecision = (length $szIntegerPart) + $iScale; return "DECIMAL($iPrecision,$iScale)" if $iPrecision >= 1 and $iPrecision <= 18 #Prec +ision = 1 to 18 and $iScale >= 0 and $iScale <= 18 + #Scale = 0 to 18 and $iScale <= $iPrecision; #Sc +ale must be <= to precision } #Float maybe? #1.175 x 10 ?38 to 3.402 x 10 38 return 'FLOAT' if $varValue >= FLOAT_MIN and $varValue <= FLOAT_MAX; #2.225 x 10 ?308 to 1.797 x 10 308 return 'DOUBLE PRECISION' if $varValue >= DOUBLE_MIN and $varValue <= DOUBLE_MAX; #Dates here eventually #<TBD> #Check for strings return 'CHAR(' . length($varValue) . ')'; } __DATA__ 0e0 0 +0 -0 1. 0.14 .14 1.24e5 24e5 -24e-5 2.3. 2.3.4 1..2 .1.1 4. .23-2147483648 2147483647 +2147483647 -2147483649 2147483648 10e10 10e-10 Dom Dominic Dominic Bush .... For the 500 trials I simply replicated the original test data 20 +times.
Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.
In reply to Re: (88% faster) Discovering minimal relational data types on large volumes
by BrowserUk
in thread Discovering minimal relational data types on large volumes
by dbush
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |