in reply to Re^8: OT: Finding Factor Closest To Square Root
in thread OT: Finding Factor Closest To Square Root

Hmm. I get the same results both ways:

P:\test>findstr # LR-nearFactor.pl #!/usr/bin/perl # my $sqrt = sqrt( $N ); my $end = $#factor; return $f ; #< $sqrt ? $f : undef; splice(@subset, $#subset - 1, 1); splice(@subset, $#subset - 1, 2, $pos); P:\test>LR-nearFactor.pl 24777695232 11 1 P:\test>findstr # LR-nearFactor.pl #!/usr/bin/perl my $end = $#factor; splice(@subset, $#subset - 1, 1); splice(@subset, $#subset - 1, 2, $pos); P:\test>LR-nearFactor.pl 24777695232 11 1

Which version of Parei do you have? I have

==================== Name: Math-Pari Version: 2.010603 Author: Ilya Zakharevich (cpan@ilyaz.org) Title: Math-Pari Abstract: Perl interface to PARI. InstDate: 21:18:09 2005 Location: http://ppm.ActiveState.com/cgibin/PPM/ppmserver-5.8-windows. +pl?urn:/PPMServer Available Platforms: 1. MSWin32-x86-multi-thread-5.8 ====================

It's kind of hard to see how using the built in sqrt function would influence factorint()> but have you tried moving the call to sqrt after the call to factorint()?

Anyway, trying to get beyond that, I picked a few ideas from tall_man's code (which is definitely the benchmark in this thread) and modified your code to read as:

#!/usr/bin/perl use strict; use warnings; use Math::Pari qw/:int factorint sqrtint PARImat/; my $N = '1' . '0' x $ARGV[0] . '1'; print "Result for $N is ", nearest_sqrt( $N ); sub prime_factors { my $N = shift; my %p_fact = PARImat( factorint( $N ) ) =~ /(\d+)/g; return map { ($_) x $p_fact{$_} } sort { $a <=> $b } keys %p_fact; } sub nearest_sqrt { my $N = shift; my $sqrt = sqrtint( $N ); my @factor = prime_factors( $N ); ....

As I understand it, ':int', makes all integers in the program Pari compatible?. And using sqrtint() ought to bypass any imcompatibilities.

The fudge with $ARGV[ 0 ] at the top allows me to test the code on some really big numbers without having to type out all the digits--I applied the same hack to tall_man's code, and then ran a few tests.

The following is the results from a few carefully chosen (particularly hard) examples timing both programs.

Going by the time your program takes in comparison to tall_man's, you appear to be doing a similar amount of work, but your code is returning -1, which I do not understand?

P:\test>timethis TM-nearFactor.pl 55 & timethis LR-nearFactor.pl 55 TimeThis : Command Line : TM-nearFactor.pl 55 TimeThis : Start Time : Sat Feb 26 14:38:33 2005 Result for 100000000000000000000000000000000000000000000000000000001 i +s 7376575663406069736503138401 TimeThis : Command Line : TM-nearFactor.pl 55 TimeThis : Start Time : Sat Feb 26 14:38:33 2005 TimeThis : End Time : Sat Feb 26 14:38:36 2005 TimeThis : Elapsed Time : 00:00:03.281 TimeThis : Command Line : LR-nearFactor.pl 55 TimeThis : Start Time : Sat Feb 26 14:38:37 2005 Result for 100000000000000000000000000000000000000000000000000000001 i +s -1 TimeThis : Command Line : LR-nearFactor.pl 55 TimeThis : Start Time : Sat Feb 26 14:38:37 2005 TimeThis : End Time : Sat Feb 26 14:38:40 2005 TimeThis : Elapsed Time : 00:00:03.265 P:\test>timethis TM-nearFactor.pl 58 & timethis LR-nearFactor.pl 58 TimeThis : Command Line : TM-nearFactor.pl 58 TimeThis : Start Time : Sat Feb 26 14:38:50 2005 Result for 10000000000000000000000000000000000000000000000000000000000 +1 is 22665854592333022426775023 TimeThis : Command Line : TM-nearFactor.pl 58 TimeThis : Start Time : Sat Feb 26 14:38:50 2005 TimeThis : End Time : Sat Feb 26 14:39:07 2005 TimeThis : Elapsed Time : 00:00:17.578 TimeThis : Command Line : LR-nearFactor.pl 58 TimeThis : Start Time : Sat Feb 26 14:39:07 2005 Result for 10000000000000000000000000000000000000000000000000000000000 +1 is -1 TimeThis : Command Line : LR-nearFactor.pl 58 TimeThis : Start Time : Sat Feb 26 14:39:07 2005 TimeThis : End Time : Sat Feb 26 14:39:25 2005 TimeThis : Elapsed Time : 00:00:17.578

The timings of these two particularly difficult runs, going by how long tall_man'code takes relative to most other runs like:

P:\test>timethis TM-nearFactor.pl 57 & timethis LR-nearFactor.pl 57 TimeThis : Command Line : TM-nearFactor.pl 57 TimeThis : Start Time : Sat Feb 26 14:38:49 2005 Result for 10000000000000000000000000000000000000000000000000000000001 + is 846610559160061 TimeThis : Command Line : TM-nearFactor.pl 57 TimeThis : Start Time : Sat Feb 26 14:38:49 2005 TimeThis : End Time : Sat Feb 26 14:38:50 2005 TimeThis : Elapsed Time : 00:00:00.140 TimeThis : Command Line : LR-nearFactor.pl 57 TimeThis : Start Time : Sat Feb 26 14:38:50 2005 Result for 10000000000000000000000000000000000000000000000000000000001 + is -1 TimeThis : Command Line : LR-nearFactor.pl 57 TimeThis : Start Time : Sat Feb 26 14:38:50 2005 TimeThis : End Time : Sat Feb 26 14:38:50 2005 TimeThis : Elapsed Time : 00:00:00.109

which despite being nearly as big a number, is solved in under a 1/5th of a second rather than 17+ seconds above. The very close similarity in timing between your code and tall_man's suggests that you are probably doing a very similar amount of work and possibly arriving at the same result, but for some reason, failing to display it?.

Maybe a rounding error somewhere. I tried to follow this through--but as you know your code better than I, you might find the solution more quickly.

Either way, your determination of the appropriate sequence of tests and short-ciruiting is very impressive. I congratulate you++ :).


Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re^10: OT: Finding Factor Closest To Square Root
by Limbic~Region (Chancellor) on Feb 27, 2005 at 14:29 UTC
    BrowserUk,
    Hmm. I get the same results both ways:

    While I might expect a different environment not to expose the bug, I am suprised that neither run gave correct results. On my machine, for 24777695232, I get:

    Which version of Parei do you have?
    $VERSION = '2.010603';

    ...have you tried moving the call to sqrt after the call to factorint()?
    Yes - and it removes the bug - thanks! I was dealing with 3 major versions of my code and minor versions of each (with and without optimizations) and that was 1 of the few things I hadn't tried.

    Anyway, trying to get beyond that, I picked a few ideas from tall_man's code (which is definitely the benchmark in this thread) and modified your code to read as:
    tall_man's code is impressive, but QM did state that all you start with is the prime factorization. Obviously if some module can give you the factors already you use that solution, so my quest became an academic exercise.

    Going by the time your program takes in comparison to tall_man's, you appear to be doing a similar amount of work, but your code is returning -1, which I do not understand?
    I am pretty sure this is an overflow issue, but believe it or not these large numbers aren't particularlly difficult. Given QM's original problem - numbers that result in a lot of prime factors would be difficult such as 24777695232. Your first number only has 5 factors which makes the number of combinations to test small:

    17 113 5882353 73765755896403138401 119968369144846370226083377

    ...which despite being nearly as big a number, is solved in under a 1/5th of a second rather than 17+ seconds above. The very close similarity in timing between your code and tall_man's suggests that you are probably doing a very similar amount of work and possibly arriving at the same result, but for some reason, failing to display it?.
    Yeah, doing a binary search on 5 numbers is bound to be faster than generating all the combinations and determining the product, but I still don't get anywhere near the 17+ seconds you see, albeit with an incorrect answer of 7.37657566340607e+27. If you want to benchmark numbers that I know my code can handle, use the following:

    And then just add a test at the top for $ARGV[0] and loop on the file if not present.

    Either way, your determination of the appropriate sequence of tests and short-ciruiting is very impressive. I congratulate you++
    Thanks. As I said, this became more of an academic exercise. See this for more details. Now that I can make the bug go away, I can test which version is the fastest with all optimizations. Thanks!

    Cheers - L~R

      Yeah, doing a binary search on 5 numbers is bound to be faster than generating all the combinations and determining the product, but I still don't get anywhere near the 17+ seconds you see

      You didn't read the output carefully did you :)

      My point was that for the number '100000000000000000000000000000000000000000000000000000000001', both your code and tall_man's ran for 17.578 seconds.

      Similarly, for '100000000000000000000000000000000000000000000000000000001' both ran for 3.265 seconds.

      And for '10000000000000000000000000000000000000000000000000000000001' both ran for under 1/5th of a second.

      The reason for moving to such big numbers was to try and find values that made the algorithm work very hard, so as to benchmark the algorithm rather than the implementation.

      The conclusion that I was attempting to draw was that whilst your code wasn't displaying the correct final result, that both programs were taking nearly identical times to run--longer when the task was harder, shorter when simpler; and always within a few fractions of a second--and that your code was probably doing all the right work and just failing display the final result correctly.

      Which is all the more impressive as your program is doing the work in Perl rather than C as with tall_man's code.

      In other words, I was trying to pay you a complement.

      Update: As it turns out, all of the time taken is to calculate the prime factors, with almost nothing to determine the nearest factor. Oh well!


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.