If you are working within a Windows environment, go ahead an install MingW or Cygwin so you can gain access to sort and some of the other Unix tools.

sort is almost the ideal, as algorithm improved for tens of years, and most implementations have handlers for special cases, such as merges between multiple files that were already sorted.

Although I veer a bit off the posts original course, it seems others have delved into the area of optimization so I'd like to drop some references for fellow geeks to some uses of Perl in the scientific community, as well as Hierarchical Data Format:

For large data sets HDF offers many of the advantages sought by this needy monk... HDF is designed to access to ordered and hierarchical sets of data on a large scale, optimized for performance and compatible with large-scale parallelization or distributed data systems.

HDF is used to store most of NASA's imaging data from satellites, but finds many other uses such as optimizing hi-speed HTML templating systems (as found in Clearsilver and the associated Data::ClearSilver::HDF.

There also is a CPAN module PDL::IO::HDF5 that reads/writes HDF5.

IF performance is really a concern, then using an appropriate storage mechanism for the on-disk data is the place one should focus. Perl makes it easy to measure this performance using Perl's benchmarking and profiling features. Perl itself can perform suprisingly well even in high throughput applications if the code is optimized based upon data gathered through profiling. You only have to look at "Bio" perl to discover plenty of examples.

And for a real diversion, the book Perl for Exploring DNA that came out in July looks like a fascinating book. Probably has a whole slew of ideas for regexp or advanced pattern matching.

spectre#9 -- "Strictly speaking, there are no enlightened people, there is only enlightened activity." -- Shunryu Suzuki

In reply to Re^2: Unpacking small chucks of data quickly by spectre9
in thread Unpacking small chucks of data quickly by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.