in reply to [OT] LLP64 .v. LP64 portability

The Perl approach is to create a type that's large enough to store an int or a pointer: IV. Apparently, it's not used properly everywhere, or it's misconfigured in your environment.

I know that there were a number of places where I32 was used where it shouldn't, and IV values were assigned to STRLEN variables, even though the latter can be smaller. I fixed those problems in substr for 5.12.0, but some remain elsewhere.

Replies are listed 'Best First'.
Re^2: [OT] LLP64 .v. LP64 portability
by BrowserUk (Patriarch) on Apr 21, 2010 at 06:34 UTC
    I fixed those problems in substr for 5.12.0

    Well done. But--this isn't critism, just a need for understanding--those fixes look very complex in places.

    For example:

    - else { - pos += curlen; - if (num_args < 3) - rem = curlen; - else if (len >= 0) { - rem = pos+len; - if (rem > (I32)curlen) - rem = curlen;

    becomes

    + if (pos1_is_uv || pos1_iv > 0) { + if ((UV)pos1_iv > curlen) + goto BOUND_FAIL; + } + + if (num_args > 2) { + if (!len_is_uv && len_iv < 0) { + pos2_iv = curlen + len_iv; + if (curlen) + pos2_is_uv = curlen-1 > ~(UV)len_iv; + else + pos2_is_uv = 0; + } else { /* len_iv >= 0 */ + if (!pos1_is_uv && pos1_iv < 0) { + pos2_iv = pos1_iv + len_iv; + pos2_is_uv = (UV)len_iv > (UV)IV_MAX; + } else { + if ((UV)len_iv > curlen-(UV)pos1_iv) + pos2_iv = curlen; + else + pos2_iv = pos1_iv+len_iv; + pos2_is_uv = 1; + }

    Is all of that manual tracking whether a integer is signed or unsigned necessary?

    If so, life just got a whole lot tougher for source-divers. That's really nasty!

    but some remain elsewhere.

    I'm currently looking at 1988 (1302 uniq), "possible loss of data" warnings spread across 73 source files in the 5.12.0 release.

    Don't you just wish that C compilers would list the identifier of offending variables!


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      In some C files that I've written, I've suppressed certain warnings with:
      #ifdef _MSC_VER #pragma warning(disable:4700 4715 4716) #endif
      I forget which 3 warnings they equate to - and haven't checked. If it's just a matter of shutting 'em up, without being too particular about the means employed, I think there's a chance that approach could be useful.

      With any luck you'd only need to do it in a few select header files (assuming that many of those 73 source files #include a common header).

      Cheers,
      Rob

        Shutting up the warnings would be useful (for Perl) if I was convinced they were all false alarms, but I'm semi-convinced that one or more of them is the reason behind some traps I experience when I get close to allocating 4GB of virtual memory.

        For Parrot, which effectively equates sizeof( INTVAL ) == sizeof( void* ) all over the show, I'm convinced that the configuration utility (that actually warns of the problem), is doing the wrong thing with respect to how it defines the fundemental typedefs.

        Personally, I think that on windows the following typedefs for STRLEN, IV and UV should be used. And most (if not all) uses of int, I32, and U32 should dropped in favour of one if the 3 above.

        #include <stdio.h> #include <stddef.h> typedef uintptr_t UV; typedef ptrdiff_t STRLEN; typedef intptr_t IV; #ifdef _WIN64 #define IS_WIN64 "" #else #define IS_WIN64 "not" #endif void main( void ){ printf( "_WIN64 is %s defined\n", IS_WIN64 ); printf( "MAX: %d\n", _INTEGRAL_MAX_BITS ); printf( "size_t is %d bytes\n", sizeof( size_t ) ); printf( "int is %d bytes\n", sizeof( int ) ); printf( "long is %d bytes\n", sizeof( long ) ); printf( "long long is %d bytes\n", sizeof( long long ) ); printf( "UV is %d bytes\n", sizeof( UV ) ); printf( "IV is %d bytes\n", sizeof( IV ) ); printf( "STRLEN is %d bytes\n", sizeof( STRLEN ) ); }

        Compiled for 32 & 64-bit targets this produces:

        C:\test>size_t.exe _WIN64 is not defined MAX: 64 size_t is 4 bytes int is 4 bytes long is 4 bytes long long is 8 bytes UV is 4 bytes IV is 4 bytes STRLEN is 4 bytes C:\test>size_t.exe _WIN64 is defined MAX: 64 size_t is 8 bytes int is 4 bytes long is 4 bytes long long is 8 bytes UV is 8 bytes IV is 8 bytes STRLEN is 8 bytes

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Is all of that manual tracking whether a integer is signed or unsigned necessary?

      The range of numbers substr accepts for its position and length offsets can be greater than the range of signed or unsigned integers.

      Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted.

      I'm all for simplifying it, but Perl doesn't currently have a type that's twice the size of IV (LONG_IV?) as far as I know. Keep in mind I wrote that under pressure since I took on the task when it was one of the last two or three 5.12 blockers. I took the easiest approach for me ("When all you have is a hammer, every problem looks like a nail.") and the one least likely to cause immediate problems.

      Note that $[ is being removed shortly, so some substr will shrink (but not the bit you posted).

        Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted

        Theoretically, but given that you'll never be able to allocate a single string >2**31 on a 32-bit machine, it seems to be overkill.

        Looking at the evolution of substr over the past few years, it is little wonder that is has such tardy performance. Like many parts of the core, it looks ripe for refactoring, but guesss given the accumulated complexity, its now virtually untouchable.

        But my main concern on seeing it, was the signed/unsigned handling was a general necessity, but I see that is not the case.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.