in reply to Re^2: Avoid keeping larger lists in Memory
in thread Avoid keeping larger lists in Memory

The very efficient optimization in your code derived from Cristoforo's is this line:
$x /= $y;
which replaces the target number by a number obtained by dividing the original number by any prime factor found. So, when you start to examine 600851475143, the code finds 71 as the first factor; so the problem is now reduced to looking for prime factors in the number 600851475143/71, i.e. 8462696833.

A bit later, the algorithm finds another prime factor 839, so that we are now left with 8462696833/839 = 10086647. And so on with the next factors, 1471 and 6857. This is very fast because the code finds very quickly prime factors which make it possible to considerably reduce the size of the problem.

But that huge optimization (BTW, this is, I believe, a variation on Euclid's algorithm for finding the GCD between two numbers) works only if you find prime factors relatively quickly. Even with a relatively modest prime number such as 2**31-1 (i.e. 2147483647) this program takes a very long time to run (well, more than 5 minutes instead of a split second):

2 2 5 5 7 13 29 71 839 1471 6857 2147483647 real 5m44.346s user 5m43.718s sys 0m0.030s
(Note that I changed your code to display all the prime factors, rather than just the largest one.)

Now, if I am changing this code with three additional optimizations:

1. using only 2 and odd numbers as potential divisors,

2. looking for divisors only up to the square root of the target number (make sure you compute the square root only once for each for loop being executed);

3. starting the for loop with the largest prime factor already found;

then I get another huge improvement of your code (with the same input data). Instead of running in more than 5 minutes, this runs in 0.064 seconds:

2 2 5 5 7 13 29 71 839 1471 6857 2147483647 real 0m0.064s user 0m0.015s sys 0m0.031s
Since this seems to be homework assignment, I will not disclose the code I have used for this test and leave it to you to try to implement the optimizations I described above.

Replies are listed 'Best First'.
Re^4: Avoid keeping larger lists in Memory
by pr33 (Scribe) on Jul 04, 2017 at 17:21 UTC

    Thank you Laurel for your suggestion on further Improving the code . I have tried the steps you have suggested partially (Still working on a sloution using the Square root of the target Number)

    Here is my code and the O/p yields right results and execution time is higher compared to ur solution . I will think about adding the Square root portion and update this thread

    #!/usr/bin/perl use warnings; use strict; ######## sub prime_factors { my $num = shift; my $large_prime_found = 2; my @primes = (); my $odd = 1; my $i = 0; for ( my $y = $large_prime_found; $y <= $num;) { $i++; #print "Iteration : $i, y => $y, Num => $num LP => $large_prime_f +ound, Arr Primes: @primes, Odd => $odd\n"; if ($num % 2 == 0) { $num /= $large_prime_found; $large_prime_found = 2; push @primes, $large_prime_found; next; } else { $odd += 2; #if ($odd <= sqrt($num)) { if ($num % $odd == 0 ) { $large_prime_found = $odd; $num /= $large_prime_found; push @primes, $large_prime_found; next; } #} } } return @primes; } $ time ./primes.pl Number: 20 , Prime_Factors: => 2 2 5 Number: 48 , Prime_Factors: => 2 2 2 2 3 Number: 96 , Prime_Factors: => 2 2 2 2 2 3 Number: 7 , Prime_Factors: => 7 Number: 69 , Prime_Factors: => 3 23 Number: 33 , Prime_Factors: => 3 11 Number: 13195 , Prime_Factors: => 5 7 13 29 Number: 600851475143 , Prime_Factors: => 71 839 1471 6857 real 0m0.011s user 0m0.005s sys 0m0.004s
      Hi pr33,

      the main point in my post was that if you add to your input list a relatively large prime, or the product of a relatively large with some other factors, your performance will fall dramatically. As an example, I took 2**31 - 1 (i.e. 2147483647), a number 280 times smaller than the large number (600851475143) that you use, but much less favorable because it is prime, your code will take not a split second, but several minutes to run (about five minutes on my laptop). Try it for your self to see the problem.

      In such a case, the optimization consisting in running the for loop until the square root of the target will provide you with a huge improvement. This optimization, together with the other two that I described, made my code about 5,000 times faster for the same input data.

      As a starting point you could just change this line:

      for ( my $y = $large_prime_found; $y <= $num;) {
      to this:
      for ( my $y = $large_prime_found; $y <= $num ** .5;) {
      This should already improve considerably the timings. But this calculates the square root of $num many many times.

      It would be better to compute the square root of $num (into, say, $max) outside of the for loop header, i.e. at the beginning of your subroutine and at the place where you modify the value of $num within the for loop, and to use $max within the header of the for loop:

      for ( my $y = $large_prime_found; $y <= $max;) {
      Check that everything still works OK with your new version of the subroutine, I haven't tested these improvements with it, and there may be one or two small things to adjust somewhere else in the code.

        shouldn't you really recalculate the sqrt every time you change x? Because the maximum remaining factor(s) will be less than or equal to the root of the new x, and depending on the size of the y, it could reduce the max significantly enough to overcome the cost of the root. As long as the sqrt is taken after the redo, not in the loop condition, the cost shouldn't be too bad for the payoff. (easily benchmarked)