in reply to Re: [OT] LLP64 .v. LP64 portability
in thread [OT] LLP64 .v. LP64 portability

I fixed those problems in substr for 5.12.0

Well done. But--this isn't critism, just a need for understanding--those fixes look very complex in places.

For example:

- else { - pos += curlen; - if (num_args < 3) - rem = curlen; - else if (len >= 0) { - rem = pos+len; - if (rem > (I32)curlen) - rem = curlen;

becomes

+ if (pos1_is_uv || pos1_iv > 0) { + if ((UV)pos1_iv > curlen) + goto BOUND_FAIL; + } + + if (num_args > 2) { + if (!len_is_uv && len_iv < 0) { + pos2_iv = curlen + len_iv; + if (curlen) + pos2_is_uv = curlen-1 > ~(UV)len_iv; + else + pos2_is_uv = 0; + } else { /* len_iv >= 0 */ + if (!pos1_is_uv && pos1_iv < 0) { + pos2_iv = pos1_iv + len_iv; + pos2_is_uv = (UV)len_iv > (UV)IV_MAX; + } else { + if ((UV)len_iv > curlen-(UV)pos1_iv) + pos2_iv = curlen; + else + pos2_iv = pos1_iv+len_iv; + pos2_is_uv = 1; + }

Is all of that manual tracking whether a integer is signed or unsigned necessary?

If so, life just got a whole lot tougher for source-divers. That's really nasty!

but some remain elsewhere.

I'm currently looking at 1988 (1302 uniq), "possible loss of data" warnings spread across 73 source files in the 5.12.0 release.

Don't you just wish that C compilers would list the identifier of offending variables!


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^3: [OT] LLP64 .v. LP64 portability
by syphilis (Archbishop) on Apr 21, 2010 at 09:10 UTC
    In some C files that I've written, I've suppressed certain warnings with:
    #ifdef _MSC_VER #pragma warning(disable:4700 4715 4716) #endif
    I forget which 3 warnings they equate to - and haven't checked. If it's just a matter of shutting 'em up, without being too particular about the means employed, I think there's a chance that approach could be useful.

    With any luck you'd only need to do it in a few select header files (assuming that many of those 73 source files #include a common header).

    Cheers,
    Rob

      Shutting up the warnings would be useful (for Perl) if I was convinced they were all false alarms, but I'm semi-convinced that one or more of them is the reason behind some traps I experience when I get close to allocating 4GB of virtual memory.

      For Parrot, which effectively equates sizeof( INTVAL ) == sizeof( void* ) all over the show, I'm convinced that the configuration utility (that actually warns of the problem), is doing the wrong thing with respect to how it defines the fundemental typedefs.

      Personally, I think that on windows the following typedefs for STRLEN, IV and UV should be used. And most (if not all) uses of int, I32, and U32 should dropped in favour of one if the 3 above.

      #include <stdio.h> #include <stddef.h> typedef uintptr_t UV; typedef ptrdiff_t STRLEN; typedef intptr_t IV; #ifdef _WIN64 #define IS_WIN64 "" #else #define IS_WIN64 "not" #endif void main( void ){ printf( "_WIN64 is %s defined\n", IS_WIN64 ); printf( "MAX: %d\n", _INTEGRAL_MAX_BITS ); printf( "size_t is %d bytes\n", sizeof( size_t ) ); printf( "int is %d bytes\n", sizeof( int ) ); printf( "long is %d bytes\n", sizeof( long ) ); printf( "long long is %d bytes\n", sizeof( long long ) ); printf( "UV is %d bytes\n", sizeof( UV ) ); printf( "IV is %d bytes\n", sizeof( IV ) ); printf( "STRLEN is %d bytes\n", sizeof( STRLEN ) ); }

      Compiled for 32 & 64-bit targets this produces:

      C:\test>size_t.exe _WIN64 is not defined MAX: 64 size_t is 4 bytes int is 4 bytes long is 4 bytes long long is 8 bytes UV is 4 bytes IV is 4 bytes STRLEN is 4 bytes C:\test>size_t.exe _WIN64 is defined MAX: 64 size_t is 8 bytes int is 4 bytes long is 4 bytes long long is 8 bytes UV is 8 bytes IV is 8 bytes STRLEN is 8 bytes

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm not familiar with those types. On paper, uintptr_t and intptr_t look good for UV and IV. However, STRLEN should remain whatever strlen returns. That is usually size_t, and it's accessed via Size_t.
Re^3: [OT] LLP64 .v. LP64 portability
by ikegami (Patriarch) on Apr 21, 2010 at 14:38 UTC

    Is all of that manual tracking whether a integer is signed or unsigned necessary?

    The range of numbers substr accepts for its position and length offsets can be greater than the range of signed or unsigned integers.

    Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted.

    I'm all for simplifying it, but Perl doesn't currently have a type that's twice the size of IV (LONG_IV?) as far as I know. Keep in mind I wrote that under pressure since I took on the task when it was one of the last two or three 5.12 blockers. I took the easiest approach for me ("When all you have is a hammer, every problem looks like a nail.") and the one least likely to cause immediate problems.

    Note that $[ is being removed shortly, so some substr will shrink (but not the bit you posted).

      Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted

      Theoretically, but given that you'll never be able to allocate a single string >2**31 on a 32-bit machine, it seems to be overkill.

      Looking at the evolution of substr over the past few years, it is little wonder that is has such tardy performance. Like many parts of the core, it looks ripe for refactoring, but guesss given the accumulated complexity, its now virtually untouchable.

      But my main concern on seeing it, was the signed/unsigned handling was a general necessity, but I see that is not the case.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Theoretically, but given that you'll never be able to allocate a single string >2**31 on a 32-bit machine, it seems to be overkill.

        I was not aware of that. What about other machines? Does it generalize to "the high/sign bit of IV is never used for valid string lengths"? If so, the code can definitely be simplified.

        I've been searching. Do you have any documentation on that? STRLEN is Size_t, which is size_t in my build, and size_t is (usually? always? in my case) an unsigned type.