in reply to Re^2: [OT] LLP64 .v. LP64 portability
in thread [OT] LLP64 .v. LP64 portability

Is all of that manual tracking whether a integer is signed or unsigned necessary?

The range of numbers substr accepts for its position and length offsets can be greater than the range of signed or unsigned integers.

Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted.

I'm all for simplifying it, but Perl doesn't currently have a type that's twice the size of IV (LONG_IV?) as far as I know. Keep in mind I wrote that under pressure since I took on the task when it was one of the last two or three 5.12 blockers. I took the easiest approach for me ("When all you have is a hammer, every problem looks like a nail.") and the one least likely to cause immediate problems.

Note that $[ is being removed shortly, so some substr will shrink (but not the bit you posted).

Replies are listed 'Best First'.
Re^4: [OT] LLP64 .v. LP64 portability
by BrowserUk (Patriarch) on Apr 21, 2010 at 19:39 UTC
    Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted

    Theoretically, but given that you'll never be able to allocate a single string >2**31 on a 32-bit machine, it seems to be overkill.

    Looking at the evolution of substr over the past few years, it is little wonder that is has such tardy performance. Like many parts of the core, it looks ripe for refactoring, but guesss given the accumulated complexity, its now virtually untouchable.

    But my main concern on seeing it, was the signed/unsigned handling was a general necessity, but I see that is not the case.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Theoretically, but given that you'll never be able to allocate a single string >2**31 on a 32-bit machine, it seems to be overkill.

      I was not aware of that. What about other machines? Does it generalize to "the high/sign bit of IV is never used for valid string lengths"? If so, the code can definitely be simplified.

      I've been searching. Do you have any documentation on that? STRLEN is Size_t, which is size_t in my build, and size_t is (usually? always? in my case) an unsigned type.

        Do you have any documentation on that?

        Across all OSs and hardware, no. I cannot give that guarantee.

        However, even if there are 32-bit hardware/OS combinations that allow the allocation of a single contiguous entity of greater than 2GB, I don't believe that perl memory allocation routines would allow it because of the math that is done in a macro (something like MEMORY_WRAP_CHECK(*) or similar). That's from memory [sic], subject to my having interpreted the code correctly; and could have changed subsequent to my last looking at it.

        What I can say is that normally,

        • Win32 only allows user space processes access to 2GB of ram;
        • Linux (without kernel patches; circa 2.4.23) only allows 1 GB. With patches this can be extended to 2GB.

        There are two methods (for either OS on x86) for extending this reach.

        1. /LARGEADDRESSAWARE & /3GB (called ZONE_HIGHMEM on linux (I believe!)).

          In my experiments with this on my old machine, whilst I could allocate up to 3GB per process, I could not allocate any single entity greater than 2GB.

          The way LAA works, is that it maps chunks of the memory above the 2GB limit through a "window" in the process' normal address space. Hence, no single allocation greater (or even close once you take the process' normal code, data and stack requirements into consideration) is possible,

        2. Page Address Extension.

          This works by mapping multiple physical addresses, in the 36-bit physical address space, into a single window within the process' 32-bit address space. Again, this has to be mapped within the process' 2GB (1GB linux) user space limits, so no single entity of greater than 2GB is possible,

        I admit that this doesn't cover off all the exotica (hardware and software) where Perl can run, but I would very much opt for the pragmatic solution of avoiding slowing down the common place, in order to cater for the potential of unknown exotica.

        That is, I would code for the assumption that no single string can be greater than 2GB and allow those porting to such exotica to handle the case of >2GB if, as, and when the need arises.

        References: win32 & Linux

        (*)Update: MEM_WRAP_CHECK(), MEM_WRAP_CHECK_1(), MEM_WRAP_CHECK_2() etc.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.