polar315 has asked for the wisdom of the Perl Monks concerning the following question:

Newbie.. Have proxy logs to parse and sort. The main sort I see loads the file into an array but, files could be in the 500 meg range. Is there a way to issue a dos sort command ? Also need to be able to forward unique file names.
open(DATAIN,$file_assignment); print $FH_dataout_s sort(<DATAIN>);
Above code works for small files but, not the big ones. Thx

Replies are listed 'Best First'.
Re: sorting large files
by VSarkiss (Monsignor) on Dec 29, 2005 at 20:23 UTC
Re: sorting large files
by salva (Canon) on Dec 29, 2005 at 22:31 UTC
      Tried the file::sort and it consumes a huge amount of memory even on my 100K test file. I was previously using a windows command line sort that worked well but, since I started adding the date to the report names a bit of a pain. Saw the comment regarding the DB. We do that now but, this is for quick and dirty daily reports. Mostly throw aways but, used occasionally. Daily logs run around 20 Gig and just parse out for specific activity for closer examination.

      PERL does great for creating the daily logs.

      Looking to see how to kick off a command line.But, will need to have the file name in it..that is the dome scratcher part.

      Thanks for all the quick replies.

Re: sorting large files
by CountZero (Bishop) on Dec 29, 2005 at 20:31 UTC
    Split each record into its constituent fields, save all in a database, calculate an index over each field and output all records sorted on any of the fields. As an added bonus you can even select which records you wish to output.

    Database servers are optimized to do this sort of thing.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: sorting large files
by swampyankee (Parson) on Dec 30, 2005 at 03:43 UTC

    You can access sytem commands -- such as Windows' or *ix's sort with the sytem function call. While I do work on Windows, I've MKS utilities and a commercial sort program, so I tend not to use the Windows sort command; the commercial sort my employer uses is quite fast and has a fairly low footprint.

    If you've a database system, that (as suggested elsewhere) is one way; it will no doubt have a better sort for large files than will Windows or *ix. Alternatively, if you (or your employer) is willing to part with a few (well, maybe many) USD, you could buy one of the commercial sort utilities, some of which are blindingly fast — possibly faster than the sorts included with a database system.

    emc

    " When in doubt, use brute force." — Ken Thompson
      I did end up using the Windows sort for it. Works well and no additional costs. This is just a quick and dirty solution for some smaller daily reports. Thanks to all for the suggestions. Great resources here and will be back often.
Re: sorting large files
by TedPride (Priest) on Dec 29, 2005 at 22:27 UTC
    Well, one kloodge way to do this would be to read in x number of lines at a time, sort them, then output to a new file, where x is the maximum number of lines Perl can easily handle at once. Then you run a series of line-by line merges on the resulting files, until they're all merged back into one file and you can unlink everything except the now sorted file.

    Assuming 500 MB of data and a reasonable amount of memory, you ought to be able to sort 50-100MB chunks at a time, with a maximum number of 9 merges.