fishnuts has asked for the wisdom of the Perl Monks concerning the following question:

I'm attempting to use each of sysopen, sysseek, sysread, and syswrite to directly access a shared scsi block device. The sysopen call is using the O_DIRECT flag to accomplish unbuffered IO, but a caveat is that all reads/writes via the filehandle must be aligned to a block boundary. It's very important that all IO to/from the device be direct and un-cooked, because multiple physical machines read and write to different parts of the same device simultanously -- It's a shared state database for a high-availability cluster, and it's imperative that I get the freshest data from the device every time I read from it.

I've used sample code from node 621630 and used the technique given there using Sys::Mmap to get a block-aligned buffer. This block-aligned buffer can be used with syswrite, successfully, but sysread using the same exact buffer still fails with an 'invalid argument' error. The referenced node's code didn't have a working example involving sysread.

Here's my test code, using a local temporary file instead of the block device:
#!/usr/bin/perl use strict; use warnings; $|++; use Fcntl qw(:DEFAULT O_ASYNC O_DIRECT); use Sys::Mmap; my $FH; my $msg; my $BUFFER=""; my $ret; my $seekpos=2; my $BUFSIZE = 4096 ; my $soffset; sysopen($FH, "/tmp/test.dat", O_RDWR | O_ASYNC | O_DIRECT | O_CREAT, 0 +666) or die "$!"; substr($BUFFER, 0, $BUFSIZE) = "\0"x$BUFSIZE; $soffset = $BUFSIZE * $seekpos; print "seeking to $soffset\n"; $ret = sysseek($FH, $soffset, 0) or print "first sysseek FAILED: $!\n" +; print "first sysseek succeeded. returned $ret\n"; $msg="data at $soffset"; substr($BUFFER, 0, length($msg))=$msg; $ret = syswrite($FH, $BUFFER, $BUFSIZE) or print "first syswrite FAILE +D: $!\n"; print "first syswrite succeeded. returned $ret\n" if defined $ret; print "\n"; print "mmap coming up...\n"; $ret=mmap($BUFFER, $BUFSIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANO +N, STDOUT) or die "Couldn't mmap: $!\n"; printf "mmap succeeded. returned address %x\n", $ret if defined $ret; if (($ret % 4096) == 0) { print "coolness... $ret is aligned to 4096 byte boundary\n"; } print "\n"; print "seeking to $soffset\n"; $ret = sysseek($FH, $soffset, 0) or print "second sysseek FAILED: $!\n +"; print "second sysseek succeeded. returned $ret\n" if defined $ret; $msg="data at $soffset"; substr($BUFFER, 0, length($msg))=$msg; $ret = syswrite($FH, $BUFFER, $BUFSIZE) or print "second syswrite FAIL +ED: $!\n"; print "second syswrite succeeded. returned $ret\n" if defined $ret; print "\n"; print "seeking to $soffset\n"; $ret = sysseek($FH, $soffset, 0) or print "third sysseek FAILED: $!\n" +; print "third sysseek succeeded. returned $ret\n" if defined $ret; $ret = sysread($FH, $BUFFER, $BUFSIZE) or print "sysread FAILED: $!\n" +; print "sysread succeeded. returned $ret\n" if defined $ret; exit 0;

As you can see if you run the test, Sometimes the first syswrite fails, which is expected because the buffer hasn't been aligned yet with mmap. The second syswrite always succeeds, but the sysread on the same filehandle and using the same aligned buffer _always_ fails. What can I do to effectively use sysread on an O_DIRECT filehandle?

Erik Schorr
sub email { scalar reverse "gro.apra\@erik" }

Replies are listed 'Best First'.
Re: sysread/syswrite and O_DIRECT alignment problem (align)
by tye (Sage) on Nov 26, 2007 at 21:12 UTC

    Using mmap() just to get an aligned buffer seems like a bit of a sledgehammer and might be causing problems (though I don't have swapped in the implications of that particular use of mmap). You could try a different approach:

    my $bufsize= 4096; my $align= 512; my $buf= 'x' x ($align+$bufsize); my $off= unpack( "J", pack "p", $buf ) % $align; $off= $align - $off if $off; sysread( $FH, $buf, $bufsize, $off ) or ...; my $data= substr( $buf, $off, $bufsize );

    - tye        

      Tye, your solution worked perfectly, both for sysread and syswrite. How did you learn of this, and what have you used this for previously?
        How did you learn of this, and what have you used this for previously?

        I read the standard docs on pack and unpack and did experiments to learn the many aspects that aren't clearly explained. Acme::ESP uses parts of this. I've also used it several times to construct arguments to be passed through Win32::API. I also read the standard sysread docs and have used the 4th argument to efficiently append more to a buffer holding a scrolling window of data from a file.

        - tye        

      That doesn't work on my perl, where sizeof(UV) is 8 and sizeof(char*) is 4 :)

        Yeah, I almost mentioned that "J" might not be the right choice. Previously I'd used "L", but I think there are also situations where that doesn't work. I almost switched to writing code to detect the system's endianness and just pull out the least-significant byte, but the possibility of a system with mixed endianness deterred me.

        If I were to put this code someplace like a module I'd probably compare things like length pack "p", "foo" and length pack "J", 0 to pick which format letter to use with unpack.

        - tye        

Re: sysread/syswrite and O_DIRECT alignment problem
by jbert (Priest) on Nov 27, 2007 at 10:35 UTC
    Perhaps perl is moving your buffer? Does getting a fresh buffer just before the sysread help? (perhaps using tye's suggestion above instead of mmap).

    I don't know the pack "p" format very well, but using that to look at the address of $BUFFER appears to show it changing between the first syswrite and the last sysread.

      I'll have to look into that, and see if it's really "reallocating" things. I'm fairly confident in my ability to keep track of variables/lists and their references in ways to mitigate leaks, but maybe this is unrelated.
Re: sysread/syswrite and O_DIRECT alignment problem
by dk (Chaplain) on Nov 27, 2007 at 08:26 UTC
    First, I don't know how to do that in perl. Second, I've looked through CPAN and found nothing. Thus, I think, if you haven't played with XS programming, now should be good time to begin doing that :) I'd suggest writing a module that operates with filehandles that require aligned access, and deals with the aligned buffers internally in C. This approach will create slight memory copying overhead (and this is perl, so that impact must be negligible I guess), but will produce a nice and clean open/read/write/etc perl API, that can be added to the big family of IO:: modules, and/or used as a tied filehandle.

    There's also a possibility to write a module that somehow manages to create SVs that refer internally to aligned memory, but that approach raises (for me) more questions than answers (should they be R/O or R/W? and if R/W, how perl should be told to re-allocate the memory correctly?). Apologies if I've muddied the waters more than necessary, but possibly some of these ideas can help you.

      The solution Tye provided worked well, and took advantage of the "offset" argument to sysread/syswrite, which I previously thought wasn't necessary... Once you know "how far out of alignment" a scalar variable is, you just pass around that offset and use it wherever you need to align the buffer for sysread/syswrite, and use substr to extract or inject your working data from/to that buffer.
Re: sysread/syswrite and O_DIRECT alignment problem
by ohcamacj (Beadle) on Nov 29, 2007 at 03:12 UTC
    The main conflict is the line in Sys::Mmap that reads
        SvLEN_set(var, slop);
    and in the case of anonymous mappings, slop evaluates to 0. (With Devel::Peek this is really easy to see.) So the buffer allocated with mmap looks like
    SV = PV(0x804cd38) at 0x8062500
             REFCNT = 1
             FLAGS = (PADBUSY,PADMY,POK,pPOK)
             PV = 0x40156000 "\0\0\0\0\0 . . . . "
             CUR = 4096
             LEN = 0
    
    Perl then notices that the buffer is too small to contain the results of sysread(), and kindly reallocates the string for you, immediately before sysread().
    The "fix" is to modify the Sys::Mmap module (perhaps call the modification Sys::Mmap-broken?), and change the offending line to
        SvLEN_set(var, 131072)
    This however, causes numerous other test cases to break . . .