http://qs1969.pair.com?node_id=134419

sifukurt has asked for the wisdom of the Perl Monks concerning the following question:

Good day, fellow monks. I've got a snippet of code that I'm hoping you can help me speed up. My code is to find the N-th root of a given number.
```use Math::BigFloat;

sub Root {
my \$num        = shift;
my \$root       = shift;
my \$iterations = shift || 5;
if ( \$num < 0 ) { return undef }
if ( \$root == 0 ) { return 1 }
my \$Num = Math::BigFloat->new( \$num );
my \$Root = Math::BigFloat->new( \$root );
my \$current = Math::BigFloat->new();
my \$guess   = Math::BigFloat->new( \$num ** ( 1 / \$root ) );
my \$t       = Math::BigFloat->new( \$guess ** ( \$root - 1 ) );
for ( 1 .. \$iterations ) {
\$current = \$guess - ( \$guess * \$t - \$Num ) / ( \$Root * \$t  );
if ( \$guess eq \$current ) { last }
\$t = \$current**(\$root-1);
\$guess = \$current;
}
return \$current;
}
This uses Newton's method for finding the roots. It produces very accurate results, provided you increase the number of iterations if you're dealing with large numbers and/or large roots. Therein lies the problem. If you want something relatively simply like the 5th root of 100:
```\$x = Root( 100, 5 );
the result is reasonably fast. However, with each iteration, it get progressively slower. So if you wanted something enormous, like:
```\$x = Root( 500000, 555 );
you could be waiting for ages. If we leave the number of iterations low, the result will likely be very inaccurate, but as we increase the number of iterations, each individual iteration gets slower and slower. The only thing I've been able to come up with so far is the comparison of \$guess and \$current inside the for loop. I was able to get a bit of a speed boost by doing a string comparison rather than a numeric comparison. Any suggestions on how to speed this up?

Thanks.

UPDATE: 28-Dec-2001 - I updated the above code to reflect the changes suggested by arhuman and Dominus. It is much more accurate and substantially faster now. Further optimization forthcoming....

___________________
Kurt

Replies are listed 'Best First'.
<strike>6 times faster?</strike> (Re: Help w/ Code Optimization)
by arhuman (Vicar) on Dec 26, 2001 at 22:27 UTC
You might also try this :
```sub Root2 {
my \$num        = shift;
my \$root       = shift;
my \$iterations = shift || 10;
if ( \$root == 0 ) { return 1 }
if ( \$num < 0 ) { return undef }
my \$current = Math::BigFloat->new();
my \$guess   = Math::BigFloat->new( \$num / \$root );
my \$t=Math::BigFloat->new(\$guess**(\$root-1)); 1st version : WRONG ! sh
+ould be in the loop.
for ( 1 .. \$iterations ) {
\$current = \$guess - ( \$guess * \$t - \$num ) / ( \$root * \$t  );
if ( \$guess eq \$current ) { last }
\$t=Math::BigFloat->new(\$current**(\$root-1));
\$guess = \$current;
}
return \$current;
}
The idea behind, is that '**' is MUCH slower than '*', so I exchange 2 '**' for one '**' and 2 '*'
It seems indeed to speed things alot :
(I'm sure your numerous tests will make it sure ;-)
```timethese(10,{
'Root'=> sub {Root(1000,60)},
'Root2'=> sub {Root2(1000,60)}});

gives as result :

Benchmark: timing 10 iterations of Root, Root2...
Root: 104 wallclock secs (104.38 usr + 0.01 sys = 104.39 CPU) @ 0.10/s (n=10)
Root2: 15 wallclock secs (14.75 usr + 0.00 sys = 14.75 CPU) @ 0.68/s (n=10)
Root2: 97 wallclock secs (92.67 usr + 0.16 sys = 92.83 CPU) @ 0.11/s (n=10)

UPDATE :
Corrected my code! I misplaced the \$t calculation outside the loop,
which is wrong beccause \$guess change inside the loop.
As you can see it's no more so fast...

When I try to run your code, I get an incorrect answer for the root. I think the problem is in the exponent. The exponents are \$root and (\$root - 1), respectively.

When I run this code:
```timethese( 100, {
'Root' => '\$x = Root( 100, 3 )',
'Root2' => '\$y = Root2( 100, 3 )',
});

print <<END;

Root: \$x
-----
Root2: \$y
END
I get this for output:
```Benchmark: timing 100 iterations of Root, Root2...
Root: 29 wallclock secs (28.39 usr +  0.00 sys = 28.39 CPU) @  3
+.52/s (n=1 00)
Root2: 20 wallclock secs (20.03 usr +  0.00 sys = 20.03 CPU) @  4
+.99/s (n=1 00)

Root: 4.6415888336127788924100763542328778501807372125002714
-----
Root2: 0.6664902595019951166742874561807256134735

___________________
Kurt
Re: Help w/ Code Optimization
by IlyaM (Parson) on Dec 26, 2001 at 21:37 UTC
Can't get your code to work. It dies with message:
```Operation `**': no method found,
left argument in overloaded package Math::BigFloat,
right argument has no overloaded magic at test.pl line 17.

Update:

BTW It seems that using Math::BigFloat methods directly is slighly faster then relying on overloaded operations:

```
use Benchmark;
use Math::BigFloat;

timethese(1000, {
Methods => sub { Math::BigFloat->new(100)->fmul(Math::BigFloat->ne
+w(100)) },
Operations => sub { Math::BigFloat->new(100) * Math::BigFloat->new
+(100) },
});

bash-2.05a\$ perl test.pl
Benchmark: timing 1000 iterations of Methods, Operations...
Methods:  2 wallclock secs ( 1.48 usr +  0.01 sys =  1.49 CPU) @ 67
+1.14/s (n=1000)
Operations:  1 wallclock secs ( 1.63 usr +  0.00 sys =  1.63 CPU) @ 61
+3.50/s (n=1000)

--
Ilya Martynov (http://martynov.org/)

Sorry, I guess I should have specified what version of Math::BigInt you need. You need to have at least v1.47. I believe that was the first version that had "**" overloaded. Thanks for the suggestion on using the methods directly. I'll try that.
___________________
Kurt
Moreover using subroutine calls should be even more faster. That is use Math::BigFloat::OP(\$num) instead of \$num->OP.

--
Ilya Martynov (http://martynov.org/)

Re: Help w/ Code Optimization
by maverick (Curate) on Dec 26, 2001 at 21:41 UTC
Without delving into the internals of Math::BigFloat, I don't see any way to speed this up. Perhaps you could try a different approach? A different algorythm maybe?

Or take the big leap and write this routine in C and then link it in. Inline::CPP may help.

/\/\averick
perl -l -e "eval pack('h*','072796e6470272f2c5f2c5166756279636b672');"

I tried calculating the roots with logarithms, but the problem with that is that the built-in log() function is only accurate to a fixed number of places. Other than the logarithm method and Newton's method, I don't know of any other ways to calculate roots. Does anyone know how to calculate a logarithm manually? That is, calculate a logarithm without using the built-in log() function?

My C and C++ skills are virtually non-existent, so sadly, that option isn't really available to me. This might be a good time to improve those skill, eh? ;-)
___________________
Kurt
There's an algorithm here to compute natural logarithms to any precision. It works for me, you might have to combine it with Math::BigFloat to get the precision you want (Oh, you're already using that...)
:-)

However, I don't know if going this route is any better than your current algorithm for getting the nth root of a number...

Update: By request, here's what I did to make the algorithm in the link above work (in the Read More link at the bottom of this post). This is not a perfect implementation, I'm pretty sure I'm losing precision somewhere, and you could probably make better use of Math::BigFloat, but it seems to be on the right track. And I don't know how this compares with the other ways of finding the root of a number...

Update: Works alot better now, not losing so much precision anymore (in fact, it looks like results are accurate to the precision you specify now) :-)

Updated code to reflect new version of Math::BigFloat
Also, did some rough benchmarks computing the 60th root of 1000. "Rough," because sifukurt's algorithm specifies 'iterations' to get precision, and I specify decimal places. For 50 decimal places on mine, and for 1 of his iterations (which is lt 40 decimal places), his wins, for 2 iterations (which is slightly gt 50 decimal places), mine wins slightly, and for 3 iterations mine wins by alot (but then his is accurate to much gt 50 places). And I'm still using operator overloading, which is a speed hit (though I'm not sure how much, and so far we're both using it).
For 90 decimal places, his takes 3 iterations, and the time mine takes is right in the middle of the time his takes to do 2 or 3 of his iterations.

My first thought is that logs are computed using the same concept, so it doesn't make it any faster or more accurate. However, it's always log, with other things expressed in terms of log, so that function can be optomized.

Then I realized that real libraries use a Taylor/Macloren (spelling?) series to compute log and other functions, not Newton's method. This is code tuned to the specific function, and you can't do as well with an arbitrary input function unless you input the first derivitive as well.

Re: Help w/ Code Optimization
by Dominus (Parson) on Dec 27, 2001 at 10:30 UTC
I was able to make your program a lot faster with a simple change. Newton's method tends to double the number of correct digits with each iteration. If you get a good initial guess, the convergence is extremely fast. But if you botch the initial guess, there are no correct digits. Your initial guess was very bad:

```  my \$guess   = Math::BigFloat->new( \$num / \$root );
Newton's method works by drawing a straight line, tangent to the polynomial curve, at the point of the guess. For your examples, your guess was much too big, so the straight line was almost vertical. This means that the new guess wasn't very far from the old one.

I replaced this one line with the following:

```     my \$guess   = Math::BigFloat->new( \$num**(1/\$root));
This uses regular hardware floating-point arithmetic to make the initial guess. Hardware floating point is accurate to 17 decimal places and is almost instantaneous. Suddenly, the initial guess is vastly more accurate, and the cost is almost zero.

Suddenly, I was getting very accurate answers very quickly. After 10 iterations, your version of the program still thinks that Root(1000,60) is about 14.088. The correct answer is about 1.12; with my change the program has the answer correct to 130 places after only 4 iterations.

I'd also recommend simplifying the guess computation as suggested by arhuman above. I guess he made a simple mistake in the algebra. But you have g- (gr-n)/rgr-1, and you can cancel the gr out of the numerator by rewriting this expression as g-g/r+ n/rgr-1. This trades a ** for a /, which should be a big win.

Hope this helps.

Re: Help w/ Code Optimization
by vladb (Vicar) on Dec 26, 2001 at 21:41 UTC
Hi ;-).

Let me throw around my math skills... Recalling some binary math I figured that 10^2 (10 to the power of 2) may be simplified into this:
10^2 = 10x10 = 10x(2x2x2+2).

Notice all the 2's there? Here's where the left shift operator '<<' comes in handy (and it's pretty fast by the way).

So, every multiplication by 2 could be replaced by a left shift by one (in binary it's equivalent to multiplying by 2 ;) like this:
```10^2 = 10<<3 + 10<<1; (by the way, this is may not be
written as 10<<4! :)
So, I've replaced 10x10 by a few left shift operators. The key here is to determine how many left shifts will have to be performed for given power. Say, if you were to raise 10 to the power of 3, you'd look down at this:

10x10x10.

From this you'll also note that 10x10 is really (10<<3 + 10<<1) therefore, each 'x10' could be replaced accordingly.

Hopefully I've given you some food for thought ;-). I"ll try to look for a few ways to guess the left shift number for you. Evidently, it depends on the number being multiplied. My thinking is that left shift should work faster than multiplication, althought, folks who've developed Perl might have taken that into consideration... ;-))

 "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith
Once upon a time, that was a good idea for PC's. On a 80486, where shift took 1 clock and multiply took up to 40, it was still faster sometimes. Note I said "up to" because it stops early, and essentially does this same thing internally--one addition for each '1' bit, using a barrel shifter that skips the zeros.

But with Pentium and later, the multiply is just as fast as anything else, and breaking it up into multiple steps makes it slower, by far.

—John

I like this idea a lot. My only problem, though, is that the exponent won't always be an integer. As a very simple example, sticking with left shift operator idea, would there be a relatively easy way to do 10**(1/3)? I still get a little confused with the whole << thing.
___________________
Kurt
Re: Help w/ Code Optimization
by Zaxo (Archbishop) on Dec 27, 2001 at 01:11 UTC

You can use the C library's pow() function, so long as your numbers don't need Math::BigFloat treatment:.

```\$ perl -MPOSIX=pow -e 'print pow(500000,1/5),\$/'
13.7972966146122