Occasionally I want to work with a big data set, too big for even the humongous memory on my late-model COTS supercomputer.

Now is such a time, and I am considering writing an Inline::C module that will mmap my massive fixed-length-record data set and provide an @-overloaded or otherwise magical object that will look like an array, to perl, but will pull fixed-length records out of my file. Surely this has been done already; use of Sys::Mmap and substr would take me pretty much there, something like

use Sys::Mmap; use constant BDfname=>"bigdatafile"; use constant BD_RECORDSIZE=>9876; -s BDfname % BD_RECORDSIZE and die"size ne mult of recsize"; open BIGDATA, BDfname or die $!; mmap( $BD, -s BDfname, PROT_READ, MAP_SHARED, BIGDATA ); sub BDrec($){ substr($BD,BD_RECORDSIZE*$_[0],BD_RECORDSIZE) }
might work. But still,

My question is, what would happen if perl_mymalloc was replaced with something that created and memory-mapped files instead of doing mallocs?

Would I get a fully checkpointable perl, able to gracefully deal with larger-than-memory data sets without eating up all my swap (of course it might still thrash, but not on the designated swap space)?

How severe would be the speed hit? (before hitting the memory limit)

Has this been done already?


In reply to implications of mmapping the whole enchilada by davidnicol

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.