in reply to Sorting big text lists

One monk's big is another monk's small. How big are these files? 10 thousand records? 40 million records?

It may well be that the file sizes are small enough that you can safely sort within Perl: it won't take too much memory, and it won't take too long.

But there is a threshold to be aware of: when the file reaches a certain size with respect to free RAM available on your computer it is more efficient to use the sort utility that comes with the operating system. (Unless you happen to be stuck on Windows, although Cygwin can can help you out there).

A sufficiently full-featured sort utility will be able to sort your file on userID and by time within userID (ascending or descending) in a single run, and it will probably be faster than Perl could do it by an order of magnitude or two. Once it is sorted it will be a snap to write a simple Perl script to walk down the file and split it out into new files when the userID changes, and those new files will already be sorted.

Back in the '60s, someone (Knuth? Hoare? Dijkstra?) observed that 50% of CPU time is spent sorting. In this age of GUIs, that proportion has no doubt decreased, but you can be sure that the sort utility that comes with your OS has had an awful lot of time spent on it making sure it runs as fast as possible (especially when the files exceed the amount of available RAM). Know when to use it.

<update>

Given a datafile as follows (I'm assuming your data really are separated by dashes):

u213-alpha-r-2002/03/19-00:09
u213-alpha-q-2002/03/19-00:08
u213-alpha-j-2002/03/19-00:01
u214-bravo-k-2002/03/19-00:02
u214-bravo-l-2002/03/19-00:03
u214-bravo-o-2002/03/19-00:06
u214-bravo-n-2002/03/19-00:05
u214-bravo-t-2002/03/19-00:11
u214-bravo-u-2002/03/19-00:12
u212-charlie-m-2002/03/19-00:04
u212-charlie-v-2002/03/19-00:13
u212-charlie-p-2002/03/19-00:07
u212-charlie-w-2002/03/19-00:14
u213-delta-s-2002/03/19-00:10

You can sort this using - as a delimiter, on the first column and then on the 4th column (descending) with the following command:

sort -t- -k1 -r -k4 file.dat >file.sorted

Hope this helps.

</update>


print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'

Replies are listed 'Best First'.
Re: Re: Sorting big text lists
by Infinity (Initiate) on Jul 30, 2002 at 23:02 UTC
    In this case my big is about 590,000 entries. I have tried the above two scripts and both have generated errors and I have no idea how to correct them. I am using Red Hat 7.3. I'm not sure what sort utilities it has that I can use. I am a little familiar with the bash shell so if there's anything I can do using that it might help. Thanks.