Gisel:

In addition to BrowserUk's tips, one I found useful was to filter the data if you don't need it all. I had to deal with processing a horrendous amount of credit card transaction information in the past, and filtering out the data I didn't need allowed me to save quite a bit of storage space[*]. So if the resulting files have a large amount of data in them you won't ever use, you may find it worth while to filter the data before storing it.

You mention that the input files are in NetCDF format, so I did a quick surf to Wikipedia's NetCDF article, and see that there are some unix command-line tools for file surgery already available. So if you know the items you need from the files, you may be able to chop out a good bit of data from them and avoid compression altogether. If you're storing the files locally, you can probably avoid the time cost of filtering the data by using your filtering operation as the operation you use to copy to long-term storage (saving some network traffic to your SAN in the bargain).

*: My original purpose wasn't to save the disk space, but to use a single file format for my process. The incoming data was in multiple very different format types. (About 15 different file formats, IIRC.) The processor needed the files sorted and in a different format. The resulting space savings (Substantial!) was just a product of the input file format.

Update: Fixed acronym... (I wonder what IIRS might mean? D'oh!)

...roboticus

When your only tool is a hammer, all problems look like your thumb.


In reply to Re^3: Getting/handling big files w/ perl by roboticus
in thread Getting/handling big files w/ perl by Gisel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.