in reply to Re: Avoid keeping larger lists in Memory
in thread Avoid keeping larger lists in Memory

Thanks Laurent and Thanks to all the Monks for the replies . Haukex's and Cristoforo's solution was very fast .

I have tried with division by 2 and only odd numbers in is_prime sub routine and that helped a lot in improving the execution time in finding whether a given number is prime or not

Cristofor's solution will generate a list of all prime factor's and then use that list to get the largest prime factor

#!/usr/bin/perl use strict; use warnings; ## my $max = 0; sub prime_factors { my $x = shift; my @factors; for ( my $y = 2; $y <= $x; $y++ ) { next if $x % $y; $x /= $y; push @factors, $y; redo; } return @factors; } foreach my $num (20,13195,600851475143) { my @factors = prime_factors( $num ); foreach my $p (@factors) { if ($p > $max) { $max = $p; } } print "Number: $num Largest Prime Factor: $max\n"; } -bash-4.1$ time ./large_prime.pl Number: 20 Largest Prime Factor: 5 Number: 13195 Largest Prime Factor: 29 Number: 600851475143 Largest Prime Factor: 6857 real 0m0.004s user 0m0.002s sys 0m0.002s

Replies are listed 'Best First'.
Re^3: Avoid keeping larger lists in Memory
by Laurent_R (Canon) on Jul 02, 2017 at 09:28 UTC
    The very efficient optimization in your code derived from Cristoforo's is this line:
    $x /= $y;
    which replaces the target number by a number obtained by dividing the original number by any prime factor found. So, when you start to examine 600851475143, the code finds 71 as the first factor; so the problem is now reduced to looking for prime factors in the number 600851475143/71, i.e. 8462696833.

    A bit later, the algorithm finds another prime factor 839, so that we are now left with 8462696833/839 = 10086647. And so on with the next factors, 1471 and 6857. This is very fast because the code finds very quickly prime factors which make it possible to considerably reduce the size of the problem.

    But that huge optimization (BTW, this is, I believe, a variation on Euclid's algorithm for finding the GCD between two numbers) works only if you find prime factors relatively quickly. Even with a relatively modest prime number such as 2**31-1 (i.e. 2147483647) this program takes a very long time to run (well, more than 5 minutes instead of a split second):

    2 2 5 5 7 13 29 71 839 1471 6857 2147483647 real 5m44.346s user 5m43.718s sys 0m0.030s
    (Note that I changed your code to display all the prime factors, rather than just the largest one.)

    Now, if I am changing this code with three additional optimizations:

    1. using only 2 and odd numbers as potential divisors,

    2. looking for divisors only up to the square root of the target number (make sure you compute the square root only once for each for loop being executed);

    3. starting the for loop with the largest prime factor already found;

    then I get another huge improvement of your code (with the same input data). Instead of running in more than 5 minutes, this runs in 0.064 seconds:

    2 2 5 5 7 13 29 71 839 1471 6857 2147483647 real 0m0.064s user 0m0.015s sys 0m0.031s
    Since this seems to be homework assignment, I will not disclose the code I have used for this test and leave it to you to try to implement the optimizations I described above.

      Thank you Laurel for your suggestion on further Improving the code . I have tried the steps you have suggested partially (Still working on a sloution using the Square root of the target Number)

      Here is my code and the O/p yields right results and execution time is higher compared to ur solution . I will think about adding the Square root portion and update this thread

      #!/usr/bin/perl use warnings; use strict; ######## sub prime_factors { my $num = shift; my $large_prime_found = 2; my @primes = (); my $odd = 1; my $i = 0; for ( my $y = $large_prime_found; $y <= $num;) { $i++; #print "Iteration : $i, y => $y, Num => $num LP => $large_prime_f +ound, Arr Primes: @primes, Odd => $odd\n"; if ($num % 2 == 0) { $num /= $large_prime_found; $large_prime_found = 2; push @primes, $large_prime_found; next; } else { $odd += 2; #if ($odd <= sqrt($num)) { if ($num % $odd == 0 ) { $large_prime_found = $odd; $num /= $large_prime_found; push @primes, $large_prime_found; next; } #} } } return @primes; } $ time ./primes.pl Number: 20 , Prime_Factors: => 2 2 5 Number: 48 , Prime_Factors: => 2 2 2 2 3 Number: 96 , Prime_Factors: => 2 2 2 2 2 3 Number: 7 , Prime_Factors: => 7 Number: 69 , Prime_Factors: => 3 23 Number: 33 , Prime_Factors: => 3 11 Number: 13195 , Prime_Factors: => 5 7 13 29 Number: 600851475143 , Prime_Factors: => 71 839 1471 6857 real 0m0.011s user 0m0.005s sys 0m0.004s
        Hi pr33,

        the main point in my post was that if you add to your input list a relatively large prime, or the product of a relatively large with some other factors, your performance will fall dramatically. As an example, I took 2**31 - 1 (i.e. 2147483647), a number 280 times smaller than the large number (600851475143) that you use, but much less favorable because it is prime, your code will take not a split second, but several minutes to run (about five minutes on my laptop). Try it for your self to see the problem.

        In such a case, the optimization consisting in running the for loop until the square root of the target will provide you with a huge improvement. This optimization, together with the other two that I described, made my code about 5,000 times faster for the same input data.

        As a starting point you could just change this line:

        for ( my $y = $large_prime_found; $y <= $num;) {
        to this:
        for ( my $y = $large_prime_found; $y <= $num ** .5;) {
        This should already improve considerably the timings. But this calculates the square root of $num many many times.

        It would be better to compute the square root of $num (into, say, $max) outside of the for loop header, i.e. at the beginning of your subroutine and at the place where you modify the value of $num within the for loop, and to use $max within the header of the for loop:

        for ( my $y = $large_prime_found; $y <= $max;) {
        Check that everything still works OK with your new version of the subroutine, I haven't tested these improvements with it, and there may be one or two small things to adjust somewhere else in the code.