It seems to me that this is a rather inefficient way to do it. In fact 20 million lines of data in flat files is a very solid indication it is time to put the data into a RDBMS. Given that MySQL and Postgres are pretty good and also Free, along with the fact that Perl has the excellent DBI module there are not many reasons not to. If you just put the customer records into a real DB you would get the following benefits:

  1. You would not need to parse the flat text files every time you wanted to check the results.
  2. Once the initial load is done all you need to do is add the incremental changes.
  3. You would not need gigabytes of memory. Ever
  4. It would be a hell of a lot faster. You can load 10,000+ records/sec into most DBs using their native text load facility. The actual query and dump will probably only take a few seconds (perhaps much faster depending on how you structure the DB).
  5. When the bosses decide they want top 10/20/30/50 by state/zip code/hair color or whatever you have to write a single line of SQL to get the answer. You can even dump this to a tab sep text file ready to import straight into Excel ready for a PPT by the PHB to the visiting VP :-) with a line like SELECT cutomer, balance FROM customers ORDER BY balance DESC LIMIT 30 INTO OUTFILE '/tmp/less_work_is_good.txt'

You could use a hash tied to an underlying file but this limits you to a single key value pair. But all that really does is solve your memory issue (at the expense of a losing lot of speed) so I will let someone else suggest that. Here is one of the many intros to DBI A short guide to DBI to get you started.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print


In reply to Re: Force perl to release memory back to the operating system by tachyon
in thread Force perl to release memory back to the operating system by Roger

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.