harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:

I was experimenting with various older Perl interpreters, and I have noticed that a single scalar variable cannot be longer than 4095 bytes in DOS using Perl 4. You can have multiple scalar variables, each 4095 bytes, but a single one cannot be longer. If you try to make it longer, it says, "Memory allocation error" and you find yourself staring at the command prompt. (4095 = 0xfff)

In Perl 5 under DOS, the limitation seems to have been raised to 268,435,455 bytes (0xfffffff bytes). If you try to make a string that is even one byte longer, it says, "Perl is out of memory." In TinyPerl 5.8 running on Windows XP, the Perl interpreter simply crashes and Windows says, "Perl Command Line Interpreter has encountered a problem and needs to close. We are sorry for the inconvenience."

It seems that the string length was stored in a word (16-bit variable) in Perl 4, and then later this was upgraded to a 32-bit long, but in both cases the upper 4 bits are "reserved." Why are the upper 4 bits reserved? Is this pattern still holding true for more modern versions of Perl? So, I would expect that in latter versions of perl, the string length is stored in 8 bytes perhaps, and again the upper 4 bits are reserved for some reason, which means the theoretical max length of a scalar is 0xfff ffff ffff ffff which, if I'm correct, would be 1 Exabyte minus one. Am I right?

(In QBASIC 1.1 under DOS, the memory appears to be shared between the program and the variables, and I think it depends how much free space you have in the system, because I was able to create a string that is 29700 bytes, but if I tried to create larger, it said, "Out of memory." If I tried to create a second string variable after this and I placed a single byte into it, then lines of my program started disappearing, which is kind of weird. Anyway, it seems that QBASIC stores the string length in a 16-bit word. And of course, we all know that C is designed to use ASCIZ strings, so it doesn't store the string length like other languages.)

Replies are listed 'Best First'.
Re: maximum length of scalar in theory
by ikegami (Patriarch) on Sep 29, 2024 at 14:26 UTC

    System limits:

    32-bit x86 OS typically provide a flat 32-bit address space, putting an upper bound of 2^32 bytes.

    64-bit x86 OS typically provide a flat 64-bit address space, putting an upper bound of 2^64 bytes.

    Some of the memory space is reserved by the OS, either to communicate with it, or to protect from common error (such as dereferencing a NULL or low-value pointer). Some of the memory is used by the program. These will lower the maximum size you can allocate.

    Perl limits:

    The size of a string must fit in a STRLEN. This is a define to size_t. So this really isn't a limit at all.

Re: maximum length of scalar in theory
by sectokia (Friar) on Oct 01, 2024 at 05:45 UTC

    You can go through the perl source code here: https://perl5.git.perl.org/ all the way to whatever version including 4 and earlier.

    I don't think there is any reservation of the upper 4 bits. STRLEN for an SV is set to MEM_SIZE which is set to size_t which technically just has to be any value at least 16 bits.

    The point at which it all falls over is up to the implementation of the memory allocation. You would need to dig into the malloc configuration for the particular compiled build. I notice the very old DOS malloc has something called 'MALLOC_ITEM_MAXSZ' which is 12 bits (ie 4096).

Re: maximum length of scalar in theory
by hippo (Archbishop) on Sep 27, 2024 at 19:11 UTC

    See Re: Maximum string length from 18 years ago.

    I notice that all your tests are on MSWin32. Have you tried the same on any of the multitude of non-MS operating systems?


    🦛

      Well, I have run a test on my Apple MacBook and SparkyLinux using Perl 5.40. But both of those systems are 64-bit with a modern Perl, and in each case, Perl seems to have limits but not sure why. I don't have unlimited memory in my computers, but I tried to create a string that is 4GB long, and Perl immediately said "Out of memory" and the program ended. So, it didn't even try to create the string. Then I tried to create a string that is 1GB, and after some time, it displayed a message that says "Killed", and then I was staring at the # prompt in Linux. So, then I tried to create an 800 MB string, and it worked. And this is how much memory I've got:
      # free total used free shared buff/cache + available Mem: 3290320 1522528 1677380 152628 397604 + 1767792 Swap: 0 0 0

      So, I can't really tell if my theory is right. I mean I would need a computer with 2 Exabytes of memory to see if it would work. But it seems I have more than 800 MB of free memory, yet it won't allow me to create a 1 GB string. I don't understand why it kills the process. Well, anyway, I just wanted to know what's a safe size for a string. So, it seems that if I create a program that works with 1 GB strings, it will run on most modern machines. But if I write a program that relies on creating and working with 2GB or 4GB strings, then that might fail on some systems. I guess, the bigger the string, the fewer computers will be able to handle it.

      Btw I also checked the Perl machine used in this online site https://www.onlinegdb.com/online_perl_compiler, and I was able to create a 1 GB string with no problem! Yay! Now, when I tried to create a 1.5 GB string, my program just stopped working without any error message. So, I guess, they limit how much memory they allocate for anonymous users online and if a process tries to use more, it gets killed silently.

        Perl seems to have limits but not sure why. I don't have unlimited memory in my computers

        The second statement is true for everyone. And that rather shows the error in the first. You have assumed that Perl has limits when in fact all your demonstration shows is that your computer (and/or operating system) has limits. You can't put 10 litres of anything in a 1 litre bottle.

        But it seems I have more than 800 MB of free memory, yet it won't allow me to create a 1 GB string.

        You have 1767792kB of free memory. Even a cursory inspection shows that this will allow the allocation of 800MB x 2 but not 1GB x 2. I therefore posit that your unshown code actually uses twice the amount of RAM that you think it should. Try adding 400MB of swap space and be amazed that your program will now run with a 1GB string instead.

        Well, anyway, I just wanted to know what's a safe size for a string. So, it seems that if I create a program that works with 1 GB strings, it will run on most modern machines. But if I write a program that relies on creating and working with 2GB or 4GB strings, then that might fail on some systems. I guess, the bigger the string, the fewer computers will be able to handle it.

        That isn't how publicly distributed programs are written. You cannot assume anything about the platform on which your code will be run, so don't even try.


        🦛