Avoid keeping larger lists in Memory

pr33 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Avoid keeping larger lists in Memory by haukex (Archbishop) on Jul 01, 2017 at 19:23 UTC
I'm running your script and watching the memory usage (top), and it's minimal and also not growing, I don't think that memory is your problem. Instead, I think it's just that the code you've implemented is relatively slow, so you probably want to look into optimizing it. A couple of recommendations there: you could, for example research faster algorithms for determining whether a number is prime, as well as faster algorithms for determining the factors of an integer, investigate where your code is spending the most time by profiling it with something like Devel::NYTProf, set up caching for a speed/memory tradeoff for your `sub is_prime` by using Memoize to cache its return values (or even cache its results to disk using that module or one of the methods I mention below), and think about how you might conceptually improve your algorithm. For example, you're looking for the largest prime factor, so why don't you iterate through the (potential) factors from largest to smallest, stopping when you've found the first prime one? Alternatively, for the task you're doing I can recommend Math::Prime::Util, it has lots of really helpful mathematical routines and it can deal with huge numbers (you should install Math::Prime::Util::GMP for speed). However, if this is homework, that might be considered cheating ;-) `use Math::Prime::Util qw/factor/; use List::Util qw/max/; print max(factor(20)), "\n"; print max(factor(13195)), "\n"; print max(factor("600851475143")), "\n"; __END__ 5 29 6857` [download] As for your original question, note that once you put things on disk, you'll have disk I/O as a potential bottleneck. But if you still want to look into that, there are a few Perl modules that make storing things on disk fairly transparent, such as AnyDBM_File for tying a hash to a DBM file, or Tie::File for tying an array to a flat text file. Also, this thread might be an interesting read: Is there any cache mechanism for running perl script, where the replies show how to cache data structures using Storable, JSON, or Path::Class (the latter only for simple arrays).	[reply] [d/l] [select]
Re: Avoid keeping larger lists in Memory by Cristoforo (Curate) on Jul 01, 2017 at 19:30 UTC
I don't know who to attribute this solution to, (I started to include the name of the author in later snippets), but it runs very fast. `#!/usr/bin/perl use strict; use warnings; my @factors = prime_factors( shift ); print $_, $/ for @factors; sub prime_factors { my $x = shift; my @factors; for ( my $y = 2; $y <= $x; $y++ ) { next if $x % $y; $x /= $y; push @factors, $y; redo; } return @factors; }` [download]	[reply] [d/l]
Re: Avoid keeping larger lists in Memory by Laurent_R (Canon) on Jul 01, 2017 at 23:47 UTC
Hi pr33, I completely agree with haukex that the size of your arrays is most probably completely irrelevant to your performance problem. I think that your program is slow because its algorithm is slow, very slow when numbers get big. A very simple improvement, when you try division, is to try with two and only odd numbers. This alone will make your algorithm about twice faster. Going further in the same direction is to skip multiples of 2, 3, 5, etc., or, better yet, to maintain a list of prime numbers and to try division only with prime numbers. Another possible option is to improve your primality test. Take a look at the Miller-Rabin algorithm (https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test). Although I am not really keen on micro-optimizations, this: `while ($i <= $num - 1) {` [download] might be better written as: `while ($i < $num) {` [download]	[reply] [d/l] [select]
Re^2: Avoid keeping larger lists in Memory by pr33 (Scribe) on Jul 02, 2017 at 02:06 UTC
Thanks Laurent and Thanks to all the Monks for the replies . Haukex's and Cristoforo's solution was very fast . I have tried with division by 2 and only odd numbers in is_prime sub routine and that helped a lot in improving the execution time in finding whether a given number is prime or not Cristofor's solution will generate a list of all prime factor's and then use that list to get the largest prime factor #!/usr/bin/perl use strict; use warnings; ## my $max = 0; sub prime_factors { my $x = shift; my @factors; for ( my $y = 2; $y <= $x; $y++ ) { next if $x % $y; $x /= $y; push @factors, $y; redo; } return @factors; } foreach my $num (20,13195,600851475143) { my @factors = prime_factors( $num ); foreach my $p (@factors) { if ($p > $max) { $max = $p; } } print "Number: $num Largest Prime Factor: $max\n"; } -bash-4.1$ time ./large_prime.pl Number: 20 Largest Prime Factor: 5 Number: 13195 Largest Prime Factor: 29 Number: 600851475143 Largest Prime Factor: 6857 real 0m0.004s user 0m0.002s sys 0m0.002s [download]	[reply] [d/l]
Re^3: Avoid keeping larger lists in Memory by Laurent_R (Canon) on Jul 02, 2017 at 09:28 UTC
The very efficient optimization in your code derived from Cristoforo's is this line: `$x /= $y;` [download] which replaces the target number by a number obtained by dividing the original number by any prime factor found. So, when you start to examine 600851475143, the code finds 71 as the first factor; so the problem is now reduced to looking for prime factors in the number 600851475143/71, i.e. 8462696833. A bit later, the algorithm finds another prime factor 839, so that we are now left with 8462696833/839 = 10086647. And so on with the next factors, 1471 and 6857. This is very fast because the code finds very quickly prime factors which make it possible to considerably reduce the size of the problem. But that huge optimization (BTW, this is, I believe, a variation on Euclid's algorithm for finding the GCD between two numbers) works only if you find prime factors relatively quickly. Even with a relatively modest prime number such as 2**31-1 (i.e. 2147483647) this program takes a very long time to run (well, more than 5 minutes instead of a split second): `2 2 5 5 7 13 29 71 839 1471 6857 2147483647 real 5m44.346s user 5m43.718s sys 0m0.030s` [download] (Note that I changed your code to display all the prime factors, rather than just the largest one.) Now, if I am changing this code with three additional optimizations: 1. using only 2 and odd numbers as potential divisors, 2. looking for divisors only up to the square root of the target number (make sure you compute the square root only once for each for loop being executed); 3. starting the for loop with the largest prime factor already found; then I get another huge improvement of your code (with the same input data). Instead of running in more than 5 minutes, this runs in 0.064 seconds: `2 2 5 5 7 13 29 71 839 1471 6857 2147483647 real 0m0.064s user 0m0.015s sys 0m0.031s` [download] Since this seems to be homework assignment, I will not disclose the code I have used for this test and leave it to you to try to implement the optimizations I described above.	[reply] [d/l] [select]
Re^4: Avoid keeping larger lists in Memory by pr33 (Scribe) on Jul 04, 2017 at 17:21 UTC
Re^5: Avoid keeping larger lists in Memory by Laurent_R (Canon) on Jul 04, 2017 at 19:14 UTC
Some notes below your chosen depth have not been shown here
Re: Avoid keeping larger lists in Memory by huck (Prior) on Jul 02, 2017 at 00:53 UTC
One truth is that the largest prime factor of a number is always less than or equal to the square root of that number There are two places you can change to reflect that truth and that will speed up your program (otherwise as is) a lot. That wasnt quite right. In testing for a prime number you only have to test for factors below the square root of the number. If you find a factor below that point, the division of the original number by that factor results in another factor. That second factor may be the largest prime factor.	[reply]
Re: Avoid keeping larger lists in Memory by Discipulus (Canon) on Jul 02, 2017 at 23:55 UTC
You got good answer, yet. I would just add the possibility to take yet optimized code from CPAN: Math::Prime::Util (or Math::Prime::Util::GMP directly for big numbers.. but see also..) From examples: `# Project Euler, problem 3 (Largest prime factor) use Math::Prime::Util qw/factor/; use bigint; # Only necessary for 32-bit machines. say 0+(factor(600851475143))[-1]` [download] PS wow it returns in no time!: `perl -Mbigint -E "use Math::Prime::Util::GMP qw(factor); say 0+(factor +(600851475143600851475143600851475143))[ -1]" 999999000001 # ~1 second to get the largest factor of: # 60085147514360085147514360085147514300851475143008514751430085147514 +30085147514300851475143 834789126661809633627036527926883609177018602883402075695455196267` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]