My recommendation for data in this size if you want to do your own sorting is to use DB_File's BTree method. Right now tilly's scratchpad has the following snippet:
use strict; use DB_File; my %sorted; $DB_BTREE->{compare} = sub {$_[1] cmp $_[0]}; tie (%sorted, 'DB_File', undef, O_RDWR|O_CREAT, 0640, $DB_BTREE) or di +e $!; %sorted = 1..30; print map "$_\n", keys %sorted;
which shows how to create an in memory BTREE with an arbitrary sort order. If you follow tye's suggestion you can skip the arbitrary BTREE, use a temporary file rather than in memory, make the keys be your sort key, and the values be your data. Just insert into the hash and then loop over it using the each construct. (Not keys because that will cause a slurp.)

If your OS has large file support, your limit is available disk space. If it does not have large file support, your limit is about 2 GB for the temp file, which is a data structure for (considering the BTREE overhead) somewhat over a GB of data. (I would guestimate about 1.3 GB.)

There is also a File::Sort out there. I prefer the arbitrary text of DB_File rather than having the assumption of lines imposed on me. And if you want to sort data structures rather than text, be sure to look at MLDBM (and expect a big slowdown).


In reply to Re (tilly) 1: Slow at sorting? by tilly
in thread Slow at sorting? by orbital

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.