in reply to Re: Getting/handling big files w/ perl
in thread Getting/handling big files w/ perl
Theoretically yes on (1) and (2) with the caveat that I am in Alaska, and even though I am on a university trunk, realizable (vs. theoretical) band width can at times be sub-optimal. Depends on time of day, day of month, tide height…
For (3), the source site is a “.gov” site and has different rules for different parties, so I am not sure where I stand. Personal contacts have advised “try it and see what happens and who (if anyone) squawks, and work it from there”. So, I am ready to give it a go. For example, several simultaneous “curls” seem to work OK.
”you might be able to decrease the download time by concurrently requesting two or more partial downloads using the range…” The idea of directly decomposing the file and simultaneously downloading component parts is an idea I had not thought of. Actually, I had kind of thought of it but had hoped that such code might already exist. If someone has done it already, why recreate this wheel…
Which is it? Compress them or uncompress them? Or both (in that order or the reverse)? And are they anything to do with the 0.5GB file? On the surface is seems likely there is potential to overlap at least some of this work; but you'll need to make it a lot clearer what you are actually doing.
Both, actually‐ though not sequentially. The big (0.5 Gb) file is used to initialize a numerical weather prediction (NWP) model. (Actually, I need several of these.) The output from the model are hourly forecast states (~300 Mb each, in NetCDF format), 1 for each hour of the 48 h forecast period, plus one for an initial state, hence 49 files. Each file gets “post-processed” upon output (one about every 7 minutes). Ultimately, the output files get gzipped (either as a collection at the end of the run or one at a time immediately after processing.)
The files are then finally moved out of the working directory and written to RAID. Rinse. Wash. Repeat. 4 to 6 times each day, every day. So keeping the working directory clean of uncompressed files is essential and the most likely place for failure of the whole process.
Uncompression: In the course of research, the RAID-archived .gz files are not infrequently uncompressed into a working area (usually as a 49-file batch) for further interrogation. If several minutes can be saved (some how) in uncompressing said 49 files (and this might need to be done for a 30 day period: 30 x 49 ~ 1500 files), a few seconds for each file might really add up.
I doubt that sys calls to "gzip" and "gunzip" are optimal for this. There has to be a big IO buffering price to pay here.
We are all vivified… only to be ultimately garbage-collected.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Getting/handling big files w/ perl
by BrowserUk (Patriarch) on Nov 17, 2014 at 08:10 UTC | |
|
Re^3: Getting/handling big files w/ perl
by roboticus (Chancellor) on Nov 17, 2014 at 12:28 UTC |