jpl has asked for the wisdom of the Perl Monks concerning the following question:

I want to generate a bunch of uniformly distributed non-negative integers in a specified range. The guts of my program look like this:

sub main { if (@ARGV < 3) { USAGE: print STDERR ("Usage: $0 low high samples [seed]\n"); exit 1; } my ($lo, $hi, $samples, $seed) = @ARGV; goto USAGE if ($lo > $hi || $samples <= 0); $seed = srand() unless (defined($seed)); srand($seed); my $range = $hi - $lo + 1; while ($samples-- > 0) { printf("%d\n", $lo + int(rand($range))); } } main();

And the code works just fine for run-of-the-mill arguments. But I am a bit of a nut about avoiding unwanted surprises. If somebody supplies a high that is too large to fit in an integer, the program may not do what they intend. Here are some snippets from a debugging session on my PC.

DB<151> say ~0 18446744073709551615 DB<152> say "of course" if (~0 > 18446744073709551614) of course DB<153> say "of course" if (18446744073709551616 > ~0) DB<154> say 18446744073709551616 1.84467440737096e+19

I'd like to catch any case where the arguments are too big to fit in an integer. I can, and probably will, do this using Math::BigInt, but this seems like using a pile driver to crack a peanut. I could, in fact, use Math::BigInt throughout, but far downstream, I want to pack the integers into compact BigEndian numbers, so it's better for me to enforce acceptable values as early as possible.

Is there a simpler way to ensure arguments fit in integers?

Replies are listed 'Best First'.
Re: Number too big to fit in integer
by hv (Prior) on Dec 24, 2022 at 22:04 UTC

    That's tricky: for a number that doesn't fit in UV (integer), Perl automatically stores it in NV (double) which can result in an immediate loss of precision. For my system build (5.26.1 for Ubuntu) the first integer that gets stored as a double distinguishable from ~0 is 18446744073709554600 (which has actually been stored in NV as 2^64 + 2^12):

    % perl -E 'say "ok" if 18446744073709554600 > ~0' ok % perl -E 'say "ok" if 18446744073709554599 > ~0' %

    If that loss of precision is unacceptable for your use case, then you must avoid letting Perl do the string-to-number conversion for you. If you are writing pure Perl, Math::BigInt seems like a pretty reasonable way to sort that out. If you're writing XS code, you can call the C library's strtoul() or the perl API functions grok_number() or grok_number_flags().

      That's tricky: for a number that doesn't fit in UV (integer), Perl automatically stores it in NV (double) which can result in an immediate loss of precision.

      See also $Config{nv_overflows_integers_at}:

      $ uname -p x86_64 $ perl -MConfig -le 'print eval $Config{nv_overflows_integers_at}' 9007199254740992

        I kinda like that, but eval frankly creeps me out a bit.

      That's tricky: for a number that doesn't fit in UV (integer), Perl automatically stores it in NV (double) which can result in an immediate loss of precision. For my system build (5.26.1 for Ubuntu) the first integer that gets stored as a double distinguishable from ~0 is 18446744073709554600 (which has actually been stored in NV as 2^64 + 2^12):
      % perl -E 'say "ok" if 18446744073709554600 > ~0' ok % perl -E 'say "ok" if 18446744073709554599 > ~0' %
      This is a bit baffling.
      With perl-5.36.0, ivsize == 8, and nvtype eq 'double', it's looking to me that the "first integer that gets stored as a double distinguishable from ~0 is" 18446744073709553665:
      >perl -E "say 'ok' if 18446744073709553665 > ~0" ok >perl -E "say 'ok' if 18446744073709553664 > ~0" >
      Am I overlooking a bug in perl-5.26.1 ? Or is there something else going on that I've missed ?
      For me, 18446744073709554599 is correctly evaluated as being greater than ~0 :
      >perl -E "say 'ok' if 18446744073709554599 > ~0" ok >
      Cheers,
      Rob

        I get the same with 5.36.0 as I did with 5.26.1. Here's what I originally used to home in on the tipping point:

        % /opt/v5.36.0/bin/perl -MMath::BigInt -wle '$z0 = Math::BigInt->new(2 +)**64; for my $i (0..12) { $zi = $z0 + 2**$i; $ni = "$zi"; $d = ($ni +== ~0) ? "same" : "differ"; print "$i: $d" }' 0: same 1: same 2: same 3: same 4: same 5: same 6: same 7: same 8: same 9: same 10: same 11: same 12: differ %

        Using eval "$zi == ~0" to make it more like direct use gives me the same result.

      Thanks. POSIX::strtol and POSIX::strtoul crossed my mind, too, but there are rather too many caveats for comfort.

      It isn't my use case I'm most worried about. It's the expectations of some unknown user. If high and low differ by 1, the user may be expecting to see both of them in a large sample. But if they are huge, floating point precision may collapse them into a single integer value. I'd prefer to issue a fatal message than disappoint.

      Looks like Math::BigInt may be the clunky best solution.

        Why are you thinking about unsigned integers anyway? rand() is limited to the precision of a double (53 bits) and beyond that there will be integers it never generates. rand() is really only suitable for cheap throwaway random numbers anyway. Check perldoc -f rand for alternatives.
Re: Number too big to fit in integer
by haukex (Archbishop) on Dec 25, 2022 at 07:40 UTC
    Is there a simpler way to ensure arguments fit in integers?

    You get the arguments as strings from @ARGV, and as long as you don't treat them as numbers, Perl will keep them as strings, allowing you to use string comparison operations:

    use warnings; use strict; my $x = shift; length($x) && $x=~/\A[0-9]+\z/ or die "Need a non-negative integer"; my $y = "1000000000000000000000000000000000000000"; # just for demo die "Integer too big" if length($x)>length($y) or sprintf("%0*s", length($y), $x) gt $y; print "$x is ok\n"; # *now* you can treat $x like a number

      Might want to trim leading 0s off the argument, being careful not to trim a simple 0 to the empty string. And Perl will do the "right thing" with a leading +

      DB<175> $av0="+10" DB<176> say sprintf("%u", $av0) 10

      I don't want to anticipate all the stuff Perl might do to an argument on the way to treating it as a number. I think Math::BigInt may be the (conceptually) simplest thing.

Re: Number too big to fit in integer
by jwkrahn (Abbot) on Dec 24, 2022 at 22:26 UTC

    Perhaps this will help?

    $ perl -le'print "oops!" if $ARGV[ 0 ] ne sprintf "%u", $ARGV[ 0 ]' 1 +8446744073709551615 $ perl -le'print "oops!" if $ARGV[ 0 ] ne sprintf "%u", $ARGV[ 0 ]' 1 +8446744073709551616 oops!

      That crossed my mind, but perl can be too clever about tweaking arguments. Some additional debugging stuff:

      DB<169> $av0="07090" DB<170> say sprintf("%u", $av0) 7090 DB<171> say "too big?" if ($av0 ne sprintf("%u", $av0)) too big?

        If you are expecting "non-negative integers" the leading zero is superfluous anyway.

        And depending on processing could be interpreted as an octal number.

Re: Number too big to fit in integer
by BillKSmith (Monsignor) on Dec 26, 2022 at 15:53 UTC
    The maximum possible value for $hi is set by the way that perl stores integers. The actual value depends on your system. (Ref to hv).

    The maximum possible value of $range is limited by your random number generator (Ref NERDVANA).

    You should validate both.

    Bill
Re: Number too big to fit in integer
by syphilis (Archbishop) on Dec 25, 2022 at 22:50 UTC
    Is there a simpler way to ensure arguments fit in integers?

    I believe so.
    If people would just break away from the notion that it's ok to use a perl whose NV precision is less than its IV precision, then you could just do something like the obvious:
    die "value is too large if $value > ~0;
    AFAIK, that works fine on any perl whose nvsize is greater than 8 bytes (ie nvtype is either 'long double' or '__float128').
    Unfortunately, the general go-to perl configuration is one where NV precision is less than IV precision.
    Anyone (frequently including me) who uses such a perl either doesn't care about arithmetic (not me), enjoys performing mental contortions (not me), is plain stupid, or is on some perverse and antiquated system that doesn't provide for an nvsize > 8 (not me).
    Are there any other explanations ?

    Cheers,
    Rob

      It is probably not wise to try to categorize the reasons why someone might disagree with a point of view, particularly if "just plain stupid" is one of the categories.

      I generally don't need to care what my NV precision is. If I am dealing with integers that can get anywhere near the safe limit (which is a lot of the time), I use a bigint library.

        If I am dealing with integers that can get anywhere near the safe limit (which is a lot of the time), I use a bigint library.

        Fair enough, but all the OP wanted was to test whether a positive integer value was greater than ~0.
        If you think it's fair enough that checking whether $non_negative_integer_value > ~0 should not necessarily be sufficient, then that's fine by me - you've got what you want, and you're welcome to stick with your 64-bit precision IV & 53-bit precision NV perl configuration.

        To me, the behaviour (re integer-float conversions) on this and only this IV/NV configuration is interesting and challenging, up to a point. But it's poorly thought out (if it was ever actually thought out at all), and having to deal with it is counter-productive. Perl is supposed to DWIM and to make things easier, and this particular configuration falls down in those 2 regards.
        That, nearly a quarter of the way through the 21st century this configuration is still arguably the most prevalent by far, makes me wonder about the mental acuity of perl programmers - at least those perl programmers that are interested in perl's math operations.
        OTOH, for all other possible IV/NV permutations perl's math behaviour (re integer-float conversions) is sane, helpful, DWIMs, and makes things easier.

        Cheers,
        Rob
Re: Number too big to fit in integer
by harangzsolt33 (Deacon) on Dec 25, 2022 at 00:59 UTC
    You could check the length of a string first to see if it has too many digits. But some very large numbers might be stored in scientific notation, which then reduces the length of the string, so checking the string length won't give you an accurate estimate.

    I remember, I have written a sub awhile ago that converts a number of any size to scientific notation. This works on extremely large numbers as well and extremely small numbers. It successfully decodes scientific notation, of course, and it treats numbers as a string, so you don't lose any precision. This sub can EASILY be modified to return only the exponent part of a number, not the entire number. Then you should check if the exponent is a positive number, and then how big is it. If it's bigger than, let's say 15, then you know that the number would lose precision if it were treated as a number.

    ################################################## # v2022.06.30 # Converts a decimal number to scientific notation. # # This function converts a number to standard scientific notation. # This function expects to receive a decimal number, # and it returns a number in scientific notation. # Both the input and output numbers are treated as strings. # If the input string had any spaces, commas or any other illegal # characters, those are removed from the number. # # Example: # # Input: "" Output: "+0E+0" # Input: "-20.35" Output: "-2.035E+1" # Input: "0.000333E-12" Output: "+3.33E-16" # Input: "000.00001000" Output: "+1E-5" # Input: "$ 75,800.99 " Output: "+7.580099E+4" # Input: " (12.49) abc" Output: "-1.249E+1" # Input: " 2008300" Output: "+2.0083E+6" # Input: ".225E+76" Output: "+2.25E+75" # Input: "10,000,000,000,000,000,000,000,000,000,000,000,000,000" # Output: "+1E+40" # # Usage: STRING = SCI(STRING N) # sub SCI { defined $_[0] or return '+0E+0'; my $NUM = $_[0]; my $M = ''; # Mantissa will be stored here my $E = ''; # Exponent will be stored here my $SIGN = 43; # Is this a negative number? (43=pos 45=neg) my $DEC = -1; # Remember the position of the decimal point my $EX = 0; # Exponent (0=no_exp 1=exp_found 43=pos_exp 45=neg_e +xp) my $Z = -1; # Start position of the last zero my $N = -1; # Position of the first non-zero digit for (my $i = 0; $i < length($NUM); $i++) { my $c = vec($NUM, $i, 8); if ($EX) # PROCESS EXPONENT: { if (($c == 43 || $c == 45) && $EX == 1) { $EX = $c; } elsif ($c > 47 && $c < 58) { $E .= chr($c); } elsif (length($E)) { last; } # What comes after the exponent? Nothin +g! } else # PROCESS MANTISSA { if ($c > 47 && $c < 58) # Digits 0-9 { if ($c == 48) { if (length($M)) { $M .= '0'; } # Keep '0' digit only if +there are other digits in front of it if ($Z < 0) { $Z = length($M); } # Remember last insignifi +cant zero } else { $M .= chr($c); # Save digits other than +zero if ($N < 0) { $N = length($M); } $Z = -1; } } elsif ($c == 68 || $c == 69 || $c == 100 || $c == 101) # D/d/E +/e { if (length($M)) { $EX = 1; } # Exponent marker found! } elsif ($c == 46) { if ($DEC < 0) { if (length($M) == 0) { $M = '0'; } $DEC = length($M); # Decimal point found! } else { last; } # A second decimal point? +?? } elsif (($c == 45 || $c == 40) && length($M) == 0) { $SIGN = 45; # It's a negative number } } } if (length($M) == 0) { return '+0E+0'; } # Convert Exponent to a number for now if ($EX) { $EX = ($EX == 45) ? -$E : $E; } # Adjust exponent if (length($M) > 1) { if ($DEC == -1) { $EX = $EX + length($M) - 1; } # No decimal po +int if ($DEC >= 2) { $EX = $EX + $DEC - 1; } # Yes decimal p +oint } # Remove trailing zeros if ($Z > 0) { $M = substr($M, 0, $Z - 1); } # Add decimal point if possible. if ($N > 1) { if (length($M) > $N) { $M = substr($M, $N - 1, 1) . '.' . substr($M, $N); $EX = $EX - $N + 1; } } else { if (length($M) > 1) { $M = substr($M, 0, 1) . '.' . substr($M, 1); + } } return chr($SIGN) . $M . 'E' . ($EX < 0 ? '' : '+') . $EX; } print SCI('00002999999999999900000006.5544433377777777888888880000001' +); exit; # Outputs: # # +2.9999999999999000000065544433377777777888888880000001E+21 #


    Merry Christmas!
      "…Converts a decimal number to scientific notation…"

      What follows seems to me to be quite a lot of lines. See here for some alternatives: Decimal to Scientific notation. Speaking of qbasic: this is a kind of Retro computing, isn't it?

      «The Crux of the Biscuit is the Apostrophe»

        "See here for some alternatives"

        I see, but that alternative treats the input as a number, and as a result, you may lose precision sometimes.

        "Speaking of qbasic: this is a kind of Retro computing, isn't it?"

        Yes, I am a member of several old computer groups and BASIC groups on Facebook. I also like Windows XP and TinyPerl 5.8 which are both considered ancient stuff. But I like to push the limits to see how much modern stuff we can do with old hardware and software.

      defined $_[0] or return '+0E+0';

      Shouldn't that be:

      defined $_[0] or return 'NaN';
        It can be. That's up to you. At the time I wrote this sub, I wanted something that *ALWAYS* returns a number in scientific notation. Not just any number but a number in a specific format. You see, the return value ALWAYS starts with a plus or minus sign, then it's followed by ONE DIGIT, ( then a decimal point, then one or more digits ), then a letter 'E' which is then followed by a plus or minus sign, then a number. Stop.

        (I wrote this function in QBASIC first, and then I ported it to Perl. So, you can tell that the code has a QBASIC-ish style and structure. But it's okay. The main thing is that it works.)

      While the style is not very "perlish", I find the comments really commendable.