bronto has asked for the wisdom of the Perl Monks concerning the following question:

Hello

The developer of a perl application that we run here has been asked about some problems we are facing with it. In a recent e-mail he wrote us about why he treated a variable in a special way:

See how I've wrapped $user_id in quotes like "$user_id" to force string context. When the user ID is represented as a number, the internal representation of the number causes rounding because of lost precision.

Now, I am not an expert at all about perl internals. But, apart of the well known cases where zero is involved, like:

bronto@brabham:~$ perl -e 'print "0.00"? "True\n": "False\n"' True bronto@brabham:~$ perl -e 'print 0.00? "True\n": "False\n"' False

...Perl has always been smart enough to make sense of how I was using numbers in scalar variables.

Does that assertion make sense? Even if $user_id has no decimal digits?

Thanks for any advice

Ciao!
--bronto


The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
--John M. Dlugosz

Replies are listed 'Best First'.
Re: "force string context"?
by kvale (Monsignor) on Apr 16, 2004 at 08:46 UTC
    There are a couple of cases where one can lose precision. If the user id is greater than 2**31 (signed integer on a 32 bit system), perl converts to floating point and precision may be lost.

    The other possibility is that the user id is of the form a.b with b a long string of digits. This will be interpreted as a floating point number and depending on the precision, those extra digits will be rounded to fit into the floating point format.

    If there are no decimal digits, I don't see how you can get into trouble with rounding operations in the string. To say more, we'd need to see examples of the user id.

    -Mark

      I got an example, and it seems that you got the point here. $user_id values are microtime timestamps, like 1082021975087815. And it is far bigger than 2**31:

      bronto@brabham:~$ perl -e 'print log(1082021975087815)/log(2),"\n"' 49.942651222871

      We are in the order of 2**50, so we really are in the field where rounding errors can occur.

      Thanks a lot for your help!

      Ciao!
      --bronto


      The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
      --John M. Dlugosz
Re: "force string context"?
by Abigail-II (Bishop) on Apr 16, 2004 at 09:36 UTC
    See how I've wrapped $user_id in quotes like "$user_id" to force string context. When the user ID is represented as a number, the internal representation of the number causes rounding because of lost precision.
    Well, yes, but no. It's true that in rare cases you want to force string context. But if you are going to use $user_id as a number, putting quotes around it isn't going to help you. Sure, you prevent $user_id to become a number, but Perl will happely convert "$user_id" to a number. If you never use $user_id as a number, Perl will not calculate its numeric value, and no rounding will happen. (But be aware of YAML!).

    The usual cases where you want for force string context are those operators that do different things depending whether their operands are strings or numbers. And in boolean context.

Re: "force string context"?
by ysth (Canon) on Apr 16, 2004 at 08:43 UTC
    Converting a number to a string and then using it again as a number may introduce a very minor amount of rounding. If you have problems due to the imprecise nature of floating point storage, it would be better to pick a precision and round explicitly, since you are very likely to hit a case where the stringizing doesn't end up being enough.

    But perhaps I don't understand what you are dealing with well enough to comment; an actual example would be good.

Re: "force string context"?
by gmpassos (Priest) on Apr 16, 2004 at 10:36 UTC
    If you need to handle integers bigger than 2**31 (> 2147483646) you should use Math::BigInt. Is just not recomended to work with numbers outside of the range, will just not work!

    Graciliano M. P.
    "Creativity is the expression of the liberty".

      If you need to handle integers bigger than 2**31 (> 2147483646) you should use Math::BigInt
      If your integers are less than about 2**51, there will be no loss of significant digits (because Perl will automatically start using doubles). If your integers will be less than 2**63, I'd prefer a 64int perl over Math::BigInt. (In fact, I always compile my perls to turn on 64 bit integer support).

      Abigail

        But to use 64int I need a 64bit CPU?

        And what is the option to enable that? I never compiled in this way and I'm interested to make some tests.

        Graciliano M. P.
        "Creativity is the expression of the liberty".

Re: "force string context"?
by CountZero (Bishop) on Apr 16, 2004 at 15:44 UTC
    I'm much surprised by the programmer's comments.

    I must confess that I am not much into Perl internals, but a quick test program did not convince me. Try this:

    $number=1082021975087815; $string="1082021975087815"; $converted="$number"; print "$number\n$string\n$converted\n";
    Output:
    1.08202197508782e+015 1082021975087815 1.08202197508782e+015
    Wrapping the variable in "" does not impose string context if beforehand the damage was already done. One should rather make sure that initially the number was stored as a string (by putting it in " " or ' ') and then do only string-like operations on this variable.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      I suspect that the guilty programmer was practicing cargo-cult techniques.

      Putting "$variables" in quotes might make people feel better about the way that Perl can "randomly change scalars from data to garbage, without warning or provocation," but it actually serves no real purpose. People coming from a language with strong typing generally feel that there is something strange about Perl, because they are giving up direct control over the representation of data.

      The reality of the situtation is that strings stay strings forever and ever, right up to the time that they are used in a numeric operation, and numbers stay numbers forever and ever, until they are used in string operations. So anybody who thinks that Perl might mangle their data should relax, and if they realize that "eq" and == are different, everything will be just fine.

Re: "force string context"?
by halley (Prior) on Apr 16, 2004 at 13:24 UTC
    Personally, I would format element identifiers in some way so they're never a seen as a decimal number. That's inviting trouble. Imagine a library where some C maintenance coder is going to use sscanf("%f") to read Dewey Decimal book catalogues. Or the problems when you compare which Perl version is newer: 5.005, 5.6.1, 5.8 or 5.10.

    It doesn't have to be complicated. Make your user_id begin with the letter U or something. Just make it clear to anyone reading the code or reading the output that a user_id should never be seen as a plain decimal number.

    --
    [ e d @ h a l l e y . c c ]

Re: "force string context"?
by graff (Chancellor) on Apr 16, 2004 at 18:21 UTC
    kvale provides a point well taken. Still, the relevance or need for quotes around a variable name actually depend on the context -- we'd need to see the code where $userid is actually being assigned to and used, to tell whether the use of quotes is really necessary. As a rule, I think Perl normally does the right thing with strings consisting of many digits. Consider:
    $s1 = "01234567891233456789"; $s2 = "012345.6789123456789"; $n1 = $s1+1; $n2 = $s2+1; $c1 = $s1; $c2 = $s2; print "$n1 == $s1 + 1\n"; print "$n2 == $s2 + 1\n"; print "$c1 eq $s1\n"; print "$c2 eq $s2\n"; __OUTPUT__ 1.23456789123346e+18 == 01234567891233456789 + 1 12346.6789123457 == 012345.6789123456789 + 1 01234567891233456789 eq 01234567891233456789 012345.6789123456789 eq 012345.6789123456789
    The assignments to $n1 and $n2, happening in numeric context, cause a loss of precision -- as well as removal of leading zeros, but using $s1 and $s2 in a numeric context does not cause a change in the string value that was originally assigned to these variables: if a variable is originally assigned a value in a string context, it will always return that value when used in a string context.

    The other thing to note is that the "default" value assignment operation is a string assignment, unless rhs variable's original value was obtained/created in a numeric context.