One monk's big is another monk's small. How big are these files? 10 thousand records? 40 million records?

It may well be that the file sizes are small enough that you can safely sort within Perl: it won't take too much memory, and it won't take too long.

But there is a threshold to be aware of: when the file reaches a certain size with respect to free RAM available on your computer it is more efficient to use the sort utility that comes with the operating system. (Unless you happen to be stuck on Windows, although Cygwin can can help you out there).

A sufficiently full-featured sort utility will be able to sort your file on userID and by time within userID (ascending or descending) in a single run, and it will probably be faster than Perl could do it by an order of magnitude or two. Once it is sorted it will be a snap to write a simple Perl script to walk down the file and split it out into new files when the userID changes, and those new files will already be sorted.

Back in the '60s, someone (Knuth? Hoare? Dijkstra?) observed that 50% of CPU time is spent sorting. In this age of GUIs, that proportion has no doubt decreased, but you can be sure that the sort utility that comes with your OS has had an awful lot of time spent on it making sure it runs as fast as possible (especially when the files exceed the amount of available RAM). Know when to use it.

<update>

Given a datafile as follows (I'm assuming your data really are separated by dashes):

u213-alpha-r-2002/03/19-00:09
u213-alpha-q-2002/03/19-00:08
u213-alpha-j-2002/03/19-00:01
u214-bravo-k-2002/03/19-00:02
u214-bravo-l-2002/03/19-00:03
u214-bravo-o-2002/03/19-00:06
u214-bravo-n-2002/03/19-00:05
u214-bravo-t-2002/03/19-00:11
u214-bravo-u-2002/03/19-00:12
u212-charlie-m-2002/03/19-00:04
u212-charlie-v-2002/03/19-00:13
u212-charlie-p-2002/03/19-00:07
u212-charlie-w-2002/03/19-00:14
u213-delta-s-2002/03/19-00:10

You can sort this using - as a delimiter, on the first column and then on the 4th column (descending) with the following command:

sort -t- -k1 -r -k4 file.dat >file.sorted

Hope this helps.

</update>


print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'

In reply to Re: Sorting big text lists by grinder
in thread Sorting big text lists by Infinity

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.