Hi fellow monks,

I'm using PDL to work on a dataset that's too large to fit in memory. So, I'm using the function PDL::IO::Fastraw::mapfraw to memory map the file.

Which is all fine and dandy, but I'd like my processing to go quicker. I'm not altering this dataset at all; which is why I'm confused to see it (apparently) being written back to disk when a page is swapped out.

So my question is:
Given that I'm not altering the data and so it doesn't need to be written back to disk, how can I avoid this happening and so speed things up?

The code in Fastraw that's doing the memory mapping is:

sub PDL::mapfraw { my $class = shift; my($name,$opts) = @_; my $hdr; if($opts->{Dims}) { my $datatype = $opts->{Datatype}; if(!defined $datatype) {$datatype = $PDL_D;} $hdr->{Type} = $datatype; $hdr->{Dims} = $opts->{Dims}; $hdr->{NDims} = scalar(@{$opts->{Dims}}); } else { $hdr = _read_frawhdr($name); } $s = PDL::Core::howbig($hdr->{Type}); for(@{$hdr->{Dims}}) { $s *= $_; } my $pdl = $class->zeroes(new PDL::Type($hdr->{Type})); # $pdl->dump(); $pdl->setdims($hdr->{Dims}); # $pdl->dump(); $pdl->set_data_by_mmap($name,$s,1,($opts->{ReadOnly}?0:1), ($opts->{Creat}?1:0), (0644), ($opts->{Creat} || $opts->{Trunc} ? 1:0)); # $pdl->dump(); if($opts->{Creat}) { _writefrawhdr($pdl,$name); } return $pdl; }
(written by Karl Glazebrook, the author of the module, not by me)

This calls on the C routine set_data_by_mmap in PDL/Basic/Core/Core.xs.PL, where what seems to be the relevant part looks like this:

set_data_by_mmap(it,fname,len,writable,shared,creat,mode,trunc) pdl *it char *fname int len int writable int shared int creat int mode int trunc CODE: #ifdef USE_MMAP int fd; pdl_freedata(it); fd = open(fname,(writable && shared ? O_RDWR : O_RDONLY)| (creat ? O_CREAT : 0),mode); if(fd < 0) { croak("Error opening file"); } if(trunc) { ftruncate(fd,0); /* Clear all previous data */ ftruncate(fd,len); /* And make it long enough */ } if(len) { it->data = mmap(0,len,PROT_READ | (writable ? PROT_WRITE : 0), (shared ? MAP_SHARED : MAP_PRIVATE), fd,0); if(!it->data) croak("Error mmapping!"); } else { /* Special case: zero-length file */ it->data = NULL; } PDLDEBUG_f(printf("PDL::MMap: mapped to %d\n",it->data);) it->state |= PDL_DONTTOUCHDATA | PDL_ALLOCATED; pdl_add_deletedata_magic(it, pdl_delete_mmapped_data, len); close(fd); #else

(again not written by me, but by the authors of PDL).

I'm on Mac OS X, 10.4.9. Perl 5.8.6 built for darwin-thread-multi-2level (apparently). All help with this one gratefully received.

Best wishes, andye

PS: I have tried setting ReadOnly, it didn't help.


In reply to Memory mapping: preventing writes (PDL::IO::Fastraw) by andye

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.