Hello,

For the last few hours I was trying different ways of storing (serializing) a large data structure to disk. The ds is a 2d array with a few million rows, each has some 0-200 columns containing integers.

I usually use store (or nstore) and retrieve, but never tried them on large ds. Dumping the ds takes quite a long time (more than a few minutes) and worse than that - consumes almost all the memory of the machine (>97% of >10GB!). BTW, simply printing the entire array to a file takes about half of the time it takes to store it, and consumes almost no extra memory, but than I will have to parse it instead of retrieve it...

Anyway, it's quite frustrating. It might be worth mentioning that once on disk, the binary file cab be highly compressed (~1:100), but I was not able to figure out if I can use this nice property - compressing before dumping (via freezing) seems to work even worse.

Put it short - I need some robust method (I have many such ds's), that will allow storing large data structures on disk (optionally also compress them while doing that), and will not consume all memory resources and hopefully also be fast.

An ideas? This seems like quite a common task but I could find only little tips.

Thanks!

Roi

In reply to Storing large data structures on disk by roibrodo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.