Avoiding perl's Atof when assigning floating point values

Hi,

Most perls assign floating point values using perl's internal Atof function - and that includes perls that define "Perl_strtod".
But Perl's Atof function is notoriously incorrect, and a far better alternative IMO is to have floats assigned using Perl_strtod, which is just a wrapper around C's strtod() or strtold() or strtoflt128() - whichever is appropriate for the particular perl's nvtype.

First up, I should point out that -Dusequadmath builds (ie builds for which $Config{nvtype} reports "__float128") already use Perl_strtod(), with the result that the __float128 values are assigned correctly, in my experience on Ubuntu-16.04. (By "correctly", I mean rounded to nearest, ties to even.)

But when perl's nvtype is "double" or "long double", then values are being assigned using perl's Atof function and there's a fair chance that values are being assigned incorrectly.
The magnitude of Atof's inaccuracies is not particularly large - mostly it's only 1 unit of least precision (ULP). But it can be as large as 7 ULP when nvtype is "double" and and as large as 54 ULP when nvtype is the extended precision "long double".
(The figures of "7" and "54" are the largest I've found, having tested millions of random values - and those 2 numbers turn up often enough.)
The actual likelihood of striking inaccuracies with Atof depends upon the exponent range that you're working in. If the exponent is in the range (say) -10 to 10 the likelihood of an incorrect assignment is about 10%.
But when I randomly select values across the full exponent range, I'm finding that the chances of an incorrect assignment rise to around 97% for "doubles" and 82% for "long doubles".
When I hack the perl source to use Perl_strtod, the chances of an incorrect assignment become 0. (Ok ... I haven't checked every value ... but I've not yet found a value that has been incorrectly assigned by Perl_strtod on Ubuntu.)

It turns out that using Perl_strtod instead of perl's Atof is very easy to implement. We just need to open up numeric.c in the top level perl source folder, replace (the one occurrence of) "strtoflt128" with "Perl_strtod", replace every occurrence of "USE_QUADMATH" with "Perl_strtod", and rebuild perl.
The actual patch (for perl-5.28.0 source) can be downloaded from my scratchpad.

UPDATE: Better to grab this patch because:
a) it's a portable patch for both mingw-w64 built Windows perl && Linux perl;
b) at some time I'll probably clear my scratchpad.

That's about it. If your perl's nvtype is "__float128" or your build of perl doesn't define "Perl_strtod", then applying the patch will not change anything.
Otherwise, however, if you build perl using the patched numeric.c then perl will assign floating point values using Perl_strtod instead of perl's Atof.

It's very much the same story on MS Windows wrt to mingw-w64 builds of perl whose nvtype is "double", where exactly the same patch makes equally dramatic improvements to the assigning of floating point values.
Sadly, however, for "long doubles" on Windows, there's https://sourceforge.net/p/mingw-w64/bugs/711 and https://sourceforge.net/p/mingw-w64/bugs/725 that complicate matters.
And there's also an issue wrt to strtold's assigning of some subnormal long double values - for which I've yet to submit a bug report.
(More about Windows at a later date.)

Here's the script I use to check $ARGV[1] randomly selected values within a specified exponent range (-$ARGV[0] to +$ARGV[0]).


# atonv.pl

# Test a range of values for
# correctness of assignment

use strict;
use warnings;
use Math::MPFR qw(:mpfr);

die "Upgrade to Math-MPFR-4.03"
  unless $Math::MPFR::VERSION >= 4.03;


die "Usage: perl atonv.pl maximum_exponent how_many_values"
  unless @ARGV == 2;

$|++;

my $display = 0;

while($display !~ /^y/i && $display !~ /^n/i) {
  print("Do you want mismatched values to be displayed ? [y|n]: \n");
  $display = <STDIN>;
}

$display = 0 if $display =~ /n/i;

my($mant, $exp, $perl_unpacked, $mpfr_unpacked, $str_value);
my($count, $diff, $max_diff, $min_diff) = (0, 0, 0, 0);

my $max_exp = $ARGV[0];
$max_exp++;

# $workspace is the Math::MPFR object to which
# the value being tested is assigned.
# Here we set the precision of $workspace to the
# same number of bits as perl's NV.
my $workspace = Rmpfr_init2($Math::MPFR::BITS);

my $failed = 0;
my($perl_nv, $mpfr_nv);

for(;;) {
  $count++;
  $mant = int(rand(10))
           . '.'
           . int(rand(10))
           . int(rand(10))
           . int(rand(10))
           . int(rand(10))
           . int(rand(10))
           . int(rand(10))
           . int(rand(10))
           . int(rand(10))
  ;
  $exp = int(rand($max_exp));
  $exp = "-$exp" if $count % 2;
  $str_value = $mant . "e$exp";

  # Assign $str_value to $mpfr_nv using mpfr
  $mpfr_nv = atonv($workspace, $str_value);

  # Assign $str_value to $perl_nv using perl
  $perl_nv = "$str_value" + 0;

  # $mpfr_nv and $perl_nv should be exactly equivalent.
  # Else atleast one of mpfr and perl has assigned incorrectly.
  # IME, mpfr does not assign incorrectly.
  unless($perl_nv == $mpfr_nv) {
    $failed++;
    $perl_unpacked = scalar reverse unpack "h*", pack "F<", $perl_nv;
    $mpfr_unpacked = scalar reverse unpack "h*", pack "F<", $mpfr_nv;

    print "$str_value: $mpfr_nv:\n $perl_unpacked vs $mpfr_unpacked\n\
+n" if $display;

    $diff = hex(substr($perl_unpacked, -8, 8)) - hex(substr($mpfr_unpa
+cked, -8, 8));

    if($diff > $max_diff) {
      $max_diff = $diff;
    } 
    elsif($diff < $min_diff) {
      $min_diff = $diff;
    }
  
  }  

  last if $count == $ARGV[1];
}

print "Count: $count\n";
print "Failed: $failed\n";
print "Largest differences were $max_diff ULPs and $min_diff ULPs\n";

print "Failed: $failed\n";
print "Largest differences were $max_diff ULPs and $min_diff ULPs\n";
[download]

It requires Math-MPFR-4.03. If you want to test values in the subnormal range, you should build Math::MPFR against mpfr-4.0.x as earlier versions of mpfr were buggy in their calculation of subnormals.
As a starter, run perl atonv.pl 300 100, opting to display mismatches, and see how that fares.
Whenever I run that command against a patched perl-5.28.0, 0 mismatches are detected, irrespective of perl's nvtype.
Whenever I run that command against an unpatched perl-5.28.0, about 80 failures are detected unless, of course, nvtype is "__float128" - in which case no failures still occur.

There's probably not many who would bother, but I certainly intend to continue building perl with this hack in place.

UPDATE: For the record, gcc version on my Ubuntu box is 5.4.0, and libc version is 2.23

Cheers,
Rob

Comment on Avoiding perl's Atof when assigning floating point values Select or Download Code

Replies are listed 'Best First'.
Re: Avoiding perl's Atof when assigning floating point values by BrowserUk (Patriarch) on Jun 29, 2018 at 09:07 UTC
Seems pretty simple and I can't see any downsides; is there any reason why P5P couldn't adopt this? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit	[reply]
Re^2: Avoiding perl's Atof when assigning floating point values by syphilis (Archbishop) on Jun 30, 2018 at 02:25 UTC
is there any reason why P5P couldn't adopt this? I think the main concern would be the gcc bugs (such as the Windows ones I previously mentioned) that might thereby be exposed. And they'll likely point out that one can always assign using Perl_strtod via the POSIX module with POSIX::strtod() or POSIX::strtold(). Couple that with their disinterest in the matter (as clearly demonstrated by the lack of replies to my attempted conversation starter) and I think I'd have to present a much stronger case before anything was done. Cheers, Rob	[reply]
Re: Avoiding perl's Atof when assigning floating point values (MS Windows) by syphilis (Archbishop) on Jul 02, 2018 at 14:29 UTC
More about Windows at a later date As I mentioned in my initial post, the patch to numeric.c needs some adjustment for Windows because of https://sourceforge.net/p/mingw-w64/bugs/711 and https://sourceforge.net/p/mingw-w64/bugs/725. Eventually, I discovered that there was a fairly simple and effective workaround to this problem. It's just a matter of replacing one of the Perl_strtod calls with a call to __mingw_strtold. This adjusted numeric.c patch that portably caters for both Windows and Linux can be found at http://www.sisyphusion.tk/scratch/numeric.c.diff.txt. With the patch in place, mingw-w64 built perl-5.28.0 exhibits the same dramatic improvement in its assignment of floating point values as was seen on my Ubuntu box. (See my original post for details.) I'm not sure what effect (if any) this patch will have on a perl-5.28.0 that was built by a Microsoft compiler - but that's something I'll try to determine over the coming week. Cheers, Rob	[reply]
Re: Avoiding perl's Atof when assigning floating point values by syphilis (Archbishop) on Aug 10, 2018 at 10:18 UTC
Hi, At last !!! These changes (after some needed modifications) were pushed into blead a few hours ago. And bleadperl is now assigning values using Perl_strtod whenever Perl_strtod is defined. Thanks go to BUK for the prodding he provided; to aitap for assistance in identifying the nature of the lib/locale.t failures and the bugs (which have since been fixed) in t/run/locale.t and ext/POSIX/t/posix.t. Thanks also to Jim Keenan for initiating the smoking of my patches. And thanks to Karl Williamson for fixing the problem with the lib/locale.t failures, and for pushing the finalised patches to blead. It's party time over here tonight ... and you're all invited !! Cheers, Rob	[reply]
Re: Avoiding perl's Atof when assigning floating point values (powerpc64) by syphilis (Archbishop) on Jul 05, 2018 at 14:09 UTC
My powerpc64 box is getting a bit long in the tooth. It's a "G5" box running Debian Wheezy, with gcc-4.6.3 and libc version 2.13. If nvtype is "double", then I see pretty much the same as what I see on the Windows and Ubuntu boxes - perl's Atof frequently fails to assign floating point values correctly (wrong by up to 7 ULP), whereas Perl_strtod works flawlessly. However, on this architecture, the "long double" nvtype is the Double-Double and neither perl's Atof nor Perl_strtod produce acceptable results when assigning values. About 60% of floating point assignments are incorrect - by up to thousands of ULP. I'd be hopeful that with gcc-7.x and later, and with a current version of libc, that Perl_strtod would work significantly better on the Double-Double perl. But, it probably doesn't matter all that much anyway - because no-one runs Double-Double builds. Cheers, Rob	[reply]
Re: Avoiding perl's Atof when assigning floating point values by Anonymous Monk on Jul 04, 2018 at 02:57 UTC
I still fail to see "why P5P could not simply adopt this . . . and be done."	[reply]
Re^2: Avoiding perl's Atof when assigning floating point values by syphilis (Archbishop) on Jul 04, 2018 at 13:01 UTC
I still fail to see "why P5P could not simply adopt this . . . and be done." Yeah ... maybe I was concentrating more on the reasons that (I believe) they would not, rather than reasons that they could not. And the bug with windows strtold() turned out to be far less significant than I thought it would be. Its looking like __mingw_strtold should be an alias for strtold (just as __mingw_strtod is an alias for strtod), but for some reason (probably oversight), that's not happening. In any case, the patch that handles this bug does so as efficiently as would fixing the bug in the mingw-w64 port of gcc. I just can't see p5p adopting this patch simply on the basis of the testing that I've done - though I'll probably raise it with them again once I've managed to properly prepare myself for the eventuality of being ignored once more. (Not a particularly pleasurable experience.) Cheers, Rob	[reply]
Re^3: Avoiding perl's Atof when assigning floating point values by syphilis (Archbishop) on Jul 28, 2018 at 12:27 UTC
I just can't see p5p adopting this patch simply on the basis of the testing that I've done The good news is that there has been some more testing done. Karl Williamson and Jim Keenan created smoke-me branches to specifically test my changes (at http://perl.develop-help.com/?b=smoke-me%2Fkhw-sisyphus and http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Fsisyphus%2F41202-2nd-text-float respectively). Many thanks to Karl and Jim for doing that. The bad news is that the results aren't great. The one smoker (Carlos Guevara's) that has tested those branches has frequently encountered lib/locale.t test failures. It has also encountered other problems, such as ticket #133356 and ticket #133377, but they are separate issues not related to my changes - and I've no need to concern myself with them for now. OTOH, the lib/locale.t failures seems to have arisen from the changes I've made, and that's the issue I need to address. Unfortunately, I can't reproduce those failures locally - and I'm hoping that someone here might be able to help out. The source for Karl Williamson's branch is here. I would love to enter into a dialog with anyone who can download that file, unpack it, and build from it (on a Unix-type system) a perl that fails any lib/locale.t tests. (I have questions regarding the %Config and locale settings for such a perl). Having unpacked the source, `cd` to the top level folder and run: `sh Configure -des -Dusedevel -Duse64bitall && make && make test` [download] Examining the individual reports for Karl's branch, I can see that the lib/locale.t failures that I'm interested in are occurring on all of Carlos's FreeBSD and NetBSD systems (but not his OpenBSD system), on his Solaris system, on his slackware system, on his Fedora systems, and on his Ubuntu-14.04 (but not 16.04 or 18.04) system. I have 2 things to say to anyone who can help me out with this: 1) thank you ever so much; 2) get a life ;-) Cheers, Rob	[reply] [d/l] [select]
Re^4: Avoiding perl's Atof when assigning floating point values by aitap (Curate) on Jul 28, 2018 at 14:01 UTC
Re^5: Avoiding perl's Atof when assigning floating point values by syphilis (Archbishop) on Jul 29, 2018 at 00:59 UTC
Some notes below your chosen depth have not been shown here