Hi

Well I need some advice and pointers. Let me first explain my situation. I have this huge (ca. 20GB) text file that looks like this:

key1 key2 ndnjfgdsjfjjkjjfjf... key1 key2 kdfkjdfgdfugbjndkfgkjgndkjfjkd key43 key21 sdkjfhdghdbgbd key1 key3 jujdejnsduhffnjj key2 key2 jhzezhdjjf... i believe the structure is clear: - there are two keys - keys can be repeated
what i need to do is to sort my text file first according to my keys in the first column such that all records having key1 come first followed by key2 up to key43. Next each record inside each bucked needs to be sorted again according to the second key column. (There are only two columns, that is two keys). Now the fastest way I imagine is to create 43 bucket files and then just iterate through main file and print records accordingly. Once done, repeat the process in each bucket. Afterworlds join files and delete unnecessary buckets(files).

The downside is if a sorting is interrupted then my temp.bucket files remain on disc and have to be removed by hand. Alternatively i could intercept the sigint and delete buckets before program terminates.

What I came here to ask is, does someone have a better solution(faster, does not consume a lot of memory (100MB top)) and does not create this file mess on my disc.

any comment is more then welcomed

Thank you

baxy

UPDATE:

to keet the "file explostion" under controle, would it be possible to create "virtual file" - one file but divided into sections and then print into different sections (something like using fseek in c)- would that be advisable ???


In reply to sorting type question- space problems by baxy77bax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.