I am working with some log files that are being generated by a multi-threaded application. So my log file is not in any kind of sorted order.

I was able to figure out how I wanted to parse my data and sort it, I have no problems there. My problem lies in the performance of the sort that I created, lets take a look at the code:

@sortedarray= map{ $_->[0] } sort{ $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] || $a->[3] cmp $b->[3] } map { if ( m/^(.+?)\((\d+)\)\s-\s\[(.+?)\].+?"(.*?)"\.$/ ) { my ($disc_file,$page,$key,$val) = ($1,$2,$3,$4); [$_,$disc_file,$page,$key]; } } <DATA>; foreach(@sortedarray){ print "$_\n"; } __END__ CD1\01100809.pdf(1) - [Account Number] Indexed key "654546654". CD2\01100809.pdf(1) - [Invoice Date] Indexed key "10/08/2001". CD1\01100809.pdf(1) - [Customer Name] Indexed key "FOOBAR". CD2\01100809.pdf(1) - [Contact Name] Indexed key "Dr. FOO". CD4\01100809.pdf(20) - [Account Number] Indexed key "54356564". CD4\01100809.pdf(20) - [Invoice Date] Indexed key "10/08/2001". CD1\01100809.pdf(20) - [Customer Name] Indexed key "FOOBAR". CD1\01100809.pdf(20) - [Contact Name] Indexed key "Dr. FOO". CD1\01100814.pdf(33) - [Account Number] Indexed key "56357576537". CD3\01100814.pdf(33) - [Invoice Date] Indexed key "10/08/2001". CD3\01100814.pdf(33) - [Customer Name] Indexed key "FOOBAR". CD1\01100814.pdf(33) - [Contact Name] Indexed key "Dr. FOO". CD2\01100813.pdf(27) - [Account Number] Indexed key "73677576". CD3\01100813.pdf(27) - [Invoice Date] Indexed key "10/08/2001". CD1\01100813.pdf(27) - [Customer Name] Indexed key "FOOBAR". CD3\01100813.pdf(27) - [Contact Name] Indexed key "Dr. FOO".

This code does exactly what I want it to accomplish, it ignores lines that don't match and sorts by my CD\filename then by page number and then by the keys being indexed.

The problems I am running into is with the speed of this sort (seems very slow 16sec on a 2.3MB file on a PIII 766Mhz) can I speed this code at all?

My other issues is with file size, the larger the logfile the more memory perl hogs. What kind of techniques can I use for sorting a huge file without taking a bunch of RAM in the process.


In reply to Complex Sorting Optimization? by orbital

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.