I am currently rewriting/moosing a very old perl script (did I really write this horrid code?) that glues together a numerical weather prediction system. (BTW, perl rocks for this application!)

One of tasks here is to use ”wget” to download a 0.5 Gb file. Another is to compress/uncompress 49 files, each of which is on the order of 300Mb. This is currently implemented using syscalls to wget/gzip/gunzip. The forecast model (FORTRAN,C,C++) itself is run as multiple parallel processes on several machines using MPI. The file handling however is NOT parallelized— a single machine is responsible for this task.

This was all conceived and constructed in an era (2004) when hardware was much less muscular. These days, my master node is an 8-core 64-Gb MacPro w/ 2 Tb of SSD. During the file getting/manipulation phases of the master process, this is all the machine is doing. I suspect that some latent compute capability could be used to enhance/speed-up the file manipulation process.

Speed is everything for this application, and a few minutes saved is worth a lot. Should I manipulate files within perl (perhaps avoiding things like unnecessary IO buffering) rather than do the sys calls? (Obviously network speed remains a wild card here.)

I have researched this a bit and already have some (possibly erroneous) thoughts, but thought I would toss the global concept out there to my perlish betters. This may save me some spurious bunny trails. Not that I don't like bunnies…

The difficulty lies, not in thinking the new ideas, but in escaping from the old ones.


In reply to Getting/handling big files w/ perl by Gisel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.