This could be a good choice to use An APL trick for the Schwartzian Transform as that is almost exactly the reason why I thought it up in the first place...

There are some subtelties, mostly specified above that help. The first trick (suggested by dmitri) is to set $/ to '-------'.$/ so that you get each record.

If the file is far too big to fit into memory then the second is to create the sort based on the location within the file and the zip code - use tell() with each record.

Other tricks may depend on whether the file is local or not (whether you can afford to read it multiple times) whether you want to sort on a secondary key as well and so on.

Actually thinking about, assuming that you have sufficient memory and no secondary key you wish to sort on my prefered solution would be a two phase sort. Phase one is an insertion sort into a hash of zip codes. Then you read the file again and write out the sorted version.

local $/ = '----------'.$/; my %zips; open (FILE,"<filename") or die "error $!"; my $teller = tell(FILE); NB need position before file read! while (<FILE>); die "no zip code found record at $teller\n" unless (/Zip:\d{5}/s); push @{$zips{$1}}, $teller; $teller = tell(FILE); # FILE is optional here but a good idea! } open (SORTED,">newfile") or die "Can't open newfile: $!"; for (sort keys %zips) { for (@{$zips{$_}) { seek FILE, $_, 0; print SORTED <FILE>; } } close FILE; close SORTED;
Disclaimer - code untested may contain horrible bugs

Dingus


Enter any 47-digit prime number to continue.

In reply to Re: Sorting A File Of Large Records by dingus
in thread Sorting A File Of Large Records by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.