http://qs1969.pair.com?node_id=11122392


in reply to Re^4: Influencing the Gconvert macro
in thread Influencing the Gconvert macro

I'd love (someone) to get to the bottom of it, at least to understand if it's a real problem (that needs to be fixed by increasing a buffer size).

I can verify that the "%.*g" formatting still works ok when the buffer size needs to be bigger than 127.
For example, with this patched perl, perl -le 'printf "%.751g\n", 2 ** - 1074;' prints out all 751 mantissa digits correctly and displays the exact decimal rendition of the value 2 ** -1074.
$ ./perl -I./lib -le 'printf "%.751g\n", 2 ** - 1074;' 4.94065... another 740 digits ...65625e-324

Actually, I can add a little to that.
I hacked the source to print out the size of the buffer, and I've just discovered that, so long as I ask for no more than 91 digits, the buffer size is 127 - which should be sufficient.
But as soon as I ask for more than 91 digits, the buffer size is not displayed - indicating that the processing has switched to a different block.
Incredibly, when I ask for more than 91 digits, I've also just now realized that the "%.*g" formatting works fine on Ubuntu-18.04 perls. That is, the bug exists only when I request 18 to 91 (inclusive) digits.
If I request a number outside of that range, it works fine on a standard (unpatched) perl-5.32.0 on Ubuntu-18.04:
$ perl -le 'printf "%.91g\n", 2 ** -1074;' 4.9406564584124654e-324 $ perl -le 'printf "%.92g\n", 2 ** -1074;' 4.94065645841246544176568792868221372365059802614324764425585682500675 +50727020875186529983636e-324
So it looks to me that the concern surrounding the 127-byte buffer is unfounded, because the processing switches to a different block as soon as we ask for more than 91 digits.
But I'm not prepared to claim that I've actually proved anything ;-)

Cheers,
Rob

Replies are listed 'Best First'.
Re^6: Influencing the Gconvert macro
by hv (Prior) on Oct 01, 2020 at 13:24 UTC

    Ok, the main calculation in the perl source is I think this statement from sv.c:

    /* Determine the buffer size needed for the various * floating-point formats. * * The basic possibilities are: * * <---P---> * %f 1111111.123456789 * %e 1.111111123e+06 * %a 0x1.0f4471f9bp+20 * %g 1111111.12 * %g 1.11111112e+15 * * where P is the value of the precision in the format, or + 6 * if not specified. Note the two possible output formats +of * %g; in both cases the number of significant digits is < += * precision. * * For most of the format types the maximum buffer size ne +eded * is precision, plus: any leading 1 or 0x1, the radix * point, and an exponent. The difficult one is %f: for a * large positive exponent it can have many leading digits +, * which needs to be calculated specially. Also %a is slig +htly * different in that in the absence of a specified precisi +on, * it uses as many digits as necessary to distinguish * different values. * * First, here are the constant bits. For ease of calculat +ion * we over-estimate the needed buffer size, for example by * assuming all formats have an exponent and a leading 0x1 +. * * Also for production use, add a little extra overhead fo +r * safety's sake. Under debugging don't, as it means we're * more likely to quickly spot issues during development. */ float_need = 1 /* possible unary minus */ + 4 /* "0x1" plus very unlikely carry */ + 1 /* default radix point '.' */ + 2 /* "e-", "p+" etc */ + 6 /* exponent: up to 16383 (quad fp) */ #ifndef DEBUGGING + 20 /* safety net */ #endif + 1; /* \0 */

    .. after which if we are subject to locale it goes and checks the actual length of the utf8 representation of the radix point and adjusts that "+ 1" for the default. The above adds up to 35, which is pretty close to the difference between 91 and 127.

    The origin of the gcc warning looks like it might be gimple-ssa-sprintf.c or a close relative, in which case the "#define target_mb_len_max() 6" may well explain the difference between 127 and 133.

    So this looks pretty safe to me - and you'd certainly need a debugging perl to get close to exercising the limits.

    That just leaves the question of whether we can give the compiler enough hints for it to come to the same conclusion, or whether we'd only be able to shut it up with a sledgehammer preprocessor directive.

    Hugo

      That just leaves the question of whether we can give the compiler enough hints for it to come to the same conclusion, or whether we'd only be able to shut it up with a sledgehammer preprocessor directive.

      I'm puzzled as to how/why this check is even being run.

      I've just built perl-5.33.2 with the usual configure args , making no attempt to influence the setting of Gconvert.
      But I've applied this patch to sv.c:
      --- sv.c 2020-09-29 22:29:16.781395700 +1000 +++ sv.c_mod 2020-10-02 11:35:20.728840400 +1000 @@ -13115,7 +13115,7 @@ && intsize != 'q' ) { WITH_LC_NUMERIC_SET_TO_NEEDED_IN(in_lc_numeric, - SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis) + PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ); elen = strlen(ebuf); eptr = ebuf;
      That works fine but I'm not happy about the double-rounding that takes place when nvtype is 'double' 'long double'.
      We really want fv to be an NV, not a long double.
      And then we would need the sprintf() formatting to accommodate the nvtype - "g" versus "Lg".

      UPDATE: Duh ... there is no double-rounding ... but I think I still need to attend to the issue of "g" or "Lg" formatting.

      And it still produces that awful noise (see below my sig).
      The command that produces that noise is:
      cc -c -DPERL_CORE -fwrapv -fno-strict-aliasing -pipe -fstack-protector +-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS= +64 -std=c89 -O2 -Wall -Werror=pointer-arith -Wextra -Wc++-compat -Wwr +ite-strings -Werror=declaration-after-statement sv.c
      So I've tried (unsuccessfully) to reproduce those warnings by compiling the following C program:
      #include <stdio.h> int main(void) { char ebuf[127]; long double fv = 0.3L; int precis = 54; sprintf(ebuf, "%.*g", precis, (double) fv); printf("%s\n", ebuf); return 0; }
      I compiled it by running the same command (minus the perl-specific "-D..." switches) and it compiles noiselessly.
      So I guess that the noise must be introduced by something in those perl-specific switches.

      Do you know how to reproduce the warnings when compiling that C script ?

      Incidentally, AFAICS, that patch effectively removes Gconvert from the perl source entirely - except for Win32API-File, where the Gconvert call in cpan\Win32API-File\const2perl.h could be replaced with sprintf(), anyway.
      For Windows, Gconvert is already hard coded to sprintf().

      Cheers,
      Rob
      In file included from sv.c:32:0: sv.c: In function ‘Perl_sv_vcatpvfn_flags’: sv.c:13118:54: warning: ‘%.*g’ directive writing between 1 and 133 byt +es into a region of size 127 [-Wformat-overflow=] PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ sv.c:13118:54: note: assuming directive output of 132 bytes PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ In file included from /usr/include/stdio.h:862:0, from perlio.h:41, from iperlsys.h:50, from perl.h:3934, from sv.c:32: /usr/include/x86_64-linux-gnu/bits/stdio2.h:33:10: note: ‘__builtin___ +sprintf_chk’ output between 2 and 134 bytes into a destination of siz +e 127 return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __bos (__s), __fmt, __va_arg_pack ()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from sv.c:32:0: sv.c:13118:54: warning: ‘%.*g’ directive writing between 1 and 133 byt +es into a region of size 127 [-Wformat-overflow=] PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ sv.c:13118:54: note: assuming directive output of 132 bytes PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ perl.h:6791:13: note: in definition of macro ‘WITH_LC_NUMERIC_SET_TO_N +EEDED_IN’ block; + \ ^~~~~ sv.c:13118:21: note: in expansion of macro ‘PERL_UNUSED_RESULT’ PERL_UNUSED_RESULT(sprintf(ebuf, "%.*g", (int)pre +cis, (NV) fv)) ^ In file included from /usr/include/stdio.h:862:0, from perlio.h:41, from iperlsys.h:50, from perl.h:3934, from sv.c:32: /usr/include/x86_64-linux-gnu/bits/stdio2.h:33:10: note: ‘__builtin___ +sprintf_chk’ output between 2 and 134 bytes into a destination of siz +e 127 return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __bos (__s), __fmt, __va_arg_pack ());
        Do you know how to reproduce the warnings when compiling that C script ?

        I didn't want to die wondering (even if it killed me), so I eventually came up with this C program that reproduces the warnings:
        /* try.c */ #include <stdio.h> #include <stdlib.h> void foo(double); int main(void) { /* The value assigned to 'd' has no * * effect on the warning message. */ double d = 0.; foo(d); } void foo(double d) { char buf[127]; sprintf (buf, "%.*g\n", 126, d); printf("%s\n", buf); }
        Build with: gcc -o try.exe try.c -Wformat-overflow
        That compilation produces the following noise:
        try.c: In function ‘foo’: try.c:17:17: warning: ‘%.*g’ directive writing between 1 and 133 bytes + into a region of size 127 [-Wformat-overflow=] sprintf (buf, "%.*g", 126, d); ^~~~ try.c:17:16: note: assuming directive output of 132 bytes sprintf (buf, "%.*g", 126, d); ^~~~~~ try.c:17:2: note: ‘sprintf’ output between 2 and 134 bytes into a dest +ination of size 127 sprintf (buf, "%.*g", 126, d); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        which is essentially the same as the warnings I received when compiling perl.

        Apparently, the perl compilation process (with -O2 optimization) determines that the number of digits being requested is 126.
        It is then correctly calculated that the number of bytes written will be between 1 and 133 - which allows for the decimal point, a possible leading '-', a possible 'e', and a possible exponent of (maximum) size of 4.
        In those warnings, you'll see that the "1 and 133" changes to "2 and 134" - when the terminating NULL byte is included in the count.

        I haven't investigated just how the perl source compilation process makes the determination that the digit count has to be 126. It might just be a bug in the -O2 optimization - certainly, no warnings are emitted if the optimization level is reduced to less that -O2.

        As I mentioned previously, if the number of digits specified in the "%g" formatting is higher than 91, then the processing switches to a different block of code, so the buffer size of 127 is certainly large enough.
        I haven't looked into how or why that change occurs when digits > 91. It's not often that people will request more digits than 91 - so I'm not presently inclined to wade through the whys and wherefores of that processing path. It seems to be working correctly, and IMO that's good enough for now, at least.

        Update: Now that I understand how the warning is being created, I think it should be fairly simple to amend the perl source so that this warning is eliminated.

        Cheers,
        Rob