Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks, I am willing to use shared memory to load in a large (4 Gb) segment of shared memory a set of ~ 4 billion packed integers. I would like to do so, because later I want several parallel workers to be able to read each nth packed integer and use it, so that from each worker, I should be able to access the 1000000th packed entry just by:
my $packed; my $n =1000000-1; my $offset = $n*4; my $success = shmread($id, $packed, $offset, 4)
The rationale here is sharing a very large index where each packed integer refers to a specific item. ~ 3000000000 items, thanks to a simple bijective hashing function, can be mapped to integers in the range (0..4000000000) so that each worker will easily know at which offset it should look into the shared memory to retrieve the corresponding packed piece of information for each item. About ~1000000000 empty fields (because the item corresponding to that position is missing) contain just 4 nul bytes (i.e. pack("N",0)). However, when in the parent process I try just to create the shared memory object, by iteratively reading 1 million bytes from the index file and copying them into the shared memory, like this:
use warnings; use strict; use IPC::SysV qw(IPC_PRIVATE IPC_RMID S_IRUSR S_IWUSR); open(my $idx, "<", "$ARGV[0].idx") || die "cannot open data file\n $!" +; my $idx_size = (split(' ',`wc -c $ARGV[0].idx`))[0]; my $idx_id = shmget(IPC_PRIVATE , $idx_size, S_IRUSR | S_IWUSR) || die + "shmfet $!"; my $offset = 0; foreach my $i (0..$idx_size/1000000) { my $n=""; read($idx,$n,1000000); shmwrite($idx_id, $n, $offset, 1000000) || die "shmwrite: $!"; $offset +=1000000; } shmctl($idx_id, IPC_RMID, 0) || die "shmctl: $!"; close $idx; exit;
I always get an error "shmwrite: Bad address ". This happens always when writing the 2^32 th byte. So it looks like the shared memory segment perl can handle is limited to 2 Gb. however, running
$ ipcs -m
shows that perl actually reserved a shared memory segment much larger than that (actually, the expected 4 Gb):
------ Shared Memory Segments -------- key shmid owner perms bytes nattch stat +us 0x00000000 8454205 valerio 600 4294967296 0
I am running perl v 5.34 on an ubuntu 22.04 w/ 32 Gb RAM. perl here should be a 64-bit process:
$ perl -V:archname archname='x86_64-linux-gnu-thread-multi';

Is this a builtin limit of perl shared memory, or is there anything that I am missing?

Thanks for your wisdom,

Valerio

Replies are listed 'Best First'.
Re: Does perl have a builtin limit to the size of shared memory segments I can write to?
by NERDVANA (Priest) on Jan 07, 2025 at 20:56 UTC
    Yes, this looks like a bug in Perl.
    Perl_do_shmio(pTHX_ I32 optype, SV **mark, SV **sp) { #ifdef HAS_SHM char *shm; struct shmid_ds shmds; const I32 id = SvIVx(*++mark); SV * const mstr = *++mark; const I32 mpos = SvIVx(*++mark); const I32 msize = SvIVx(*++mark);

    The parameters are stored into variables declared as I32, rather than SSize_t.

    The mem-read and mem-write are performed in Perl rather than a system call, so there's no reason to truncate the offset and length to 32-bit.

    if (optype == OP_SHMREAD) { char *mbuf; /* suppress warning when reading into undef var (tchrist 3/Mar/00) + */ SvGETMAGIC(mstr); SvUPGRADE(mstr, SVt_PV); if (! SvOK(mstr)) SvPVCLEAR(mstr); SvPOK_only(mstr); mbuf = SvGROW(mstr, (STRLEN)msize+1); Copy(shm + mpos, mbuf, msize, char);

    I'm afraid you won't be able to get a fix for this unless you recompile your own perl, but please consider filing a bug report so that perl 5.42 could have it fixed.

    Meanwhile, there is File::Map which is basically the same thing, except you have to create your own files (in tmpfs if you want them to be pure RAM). You also have to be careful not to accidentally copy that mapped scalar, which would be expensive.

      Thanks for the prompt reply.

      I opened an issue about that.

      Recompiling my own perl is not an option, as I aim to share the code and attain some degree of portability (on Linux, no plans to extend this to other OS). I will carefully look into the File::Map option, or perhaps, I can find a simpler workaround by splitting my large index over a few shared memory segments, for now.

      Best,

      Valerio

        Thank you very much for opening it!

        Just for reference, the issue is GH #22895.