Well, what have you tried as far as benchmarking is concerned? I actually would expect:

{ my $old = select $fh; local $\="\n"; print for @very_big_redundant_array; select $old; }
to be about as fast as it gets (i.e., not chewing up stupid amounts of memory if that array really is big, while still allowing your cache to save time). But I haven't benchmarked it, and computers can do really strange things that us humans don't expect.

That said, even faster is probably to not save it, but to uniq-sort it in memory:

my @very_big_sorted_unique_array = do { my %seen; $seen{$_} = 1 for @very_big_redundant_array; sort keys %seen; };
By bypassing the disk, you can get huge improvements in speed. If you run out of memory, this will still swap to disk, but that shouldn't be slower than your method. Only if you run out of address space will you actually have problems (which could be 1.5GB, 2GB, 3.5GB, 3.75GB, or some number of TB or something, depending on OS and architecture) that using the disk manually would prevent.

Of course, if your intention is to have a reboot in the middle somewhere, then persistant storage is important - don't get me wrong, saving a huge amount of data as quickly as possible is still a worthwhile question. But I'm not sure it is necessarily an important question for you without knowing that you need to load the data in another process.


In reply to Re: Saving an array to a disk file by Tanktalus
in thread Saving an array to a disk file by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.