Re: Sorting A File Of Large Records

I'm thinking I don't want to slurp the whole file into an array or a hash and then somehow sort that.

If you don't want to pull the whole file into memory, you have a several alternatives. Here are two:

Pull pieces of the file into memory, sort them, and write them to temporary files. Then merge the results into a final, sorted file.
Scan the file once, remembering the seek offset of the beginning of each record, and the key you want to sort on. Sort the {key, offset} pairs, and then use this sorted list to seek/read records, emitting them in sorted order into a new file.

If you have enough memory to deal with the {key, offset} pairs, I'd go that way. It's easier to code. The descriptions of tell() and seek() in perlfunc tell you what you need.

Comment on Re: Sorting A File Of Large Records Select or Download Code

Replies are listed 'Best First'.
Re: Re: Sorting A File Of Large Records by Anonymous Monk on Dec 10, 2002 at 21:48 UTC
Thanks for idea #2. So I've got my {key, offset} pair nicely sorted - no problem there. I can also `seek` to each offset in the unsorted file without a problem. My problem is this: once I'm in the correct starting position for my next record in the unsorted file (i.e after calling `seek`), how can I extract just the next record and not the rest of the file from that point on: `for my $zip(sort{ $a <=> $b }(keys(%zips))) { seek FILE, $zips{$zip}, 0; print NEW <FILE>; }` [download] Obviously, `<FILE>` contains the rest of the file following OFFSET (i.e `$zips{$zip}`) and not just the next record. Any ideas as to what I'm doing wrong?	[reply] [d/l] [select]
Re: Re: Re: Sorting A File Of Large Records by dws (Chancellor) on Dec 10, 2002 at 21:56 UTC
My problem is this: once I'm in the correct starting position for my next record in the unsorted file (i.e after calling seek), how can I extract just the next record and not the rest of the file from that point on? Read the file line-by-line (i.e., using `<FILE>` in scalar context) until you've read the complete record.	[reply] [d/l]