I am currently rewriting/moosing a very old perl script (did I really write this horrid code?) that glues together a numerical weather prediction system. (BTW, perl rocks for this application!)
One of tasks here is to use ”wget” to download a 0.5 Gb file. Another is to compress/uncompress 49 files, each of which is on the order of 300Mb. This is currently implemented using syscalls to wget/gzip/gunzip. The forecast model (FORTRAN,C,C++) itself is run as multiple parallel processes on several machines using MPI. The file handling however is NOT parallelized— a single machine is responsible for this task.
This was all conceived and constructed in an era (2004) when hardware was much less muscular. These days, my master node is an 8-core 64-Gb MacPro w/ 2 Tb of SSD. During the file getting/manipulation phases of the master process, this is all the machine is doing. I suspect that some latent compute capability could be used to enhance/speed-up the file manipulation process.
Speed is everything for this application, and a few minutes saved is worth a lot. Should I manipulate files within perl (perhaps avoiding things like unnecessary IO buffering) rather than do the sys calls? (Obviously network speed remains a wild card here.)
I have researched this a bit and already have some (possibly erroneous) thoughts, but thought I would toss the global concept out there to my perlish betters. This may save me some spurious bunny trails. Not that I don't like bunnies…
—The difficulty lies, not in thinking the new ideas, but in escaping from the old ones.
In reply to Getting/handling big files w/ perl by Gisel
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |