Re: Reducing Memory Usage

I admit that I don't fully know your situation. However, I would think seriously about using a database if you don't need one. There is a lot of unrelated/unexpected overhead associated with a database. The costs in time, effort and learning curve can be high. That being said, if you need a database, generally, and believe this problem is a good one to convince your management to let you install one, then go for it.

On the other hand, this seems to me to be a simple text manipulation problem. You've had a couple of excellent, low footprint solutions posted already. Take another look at them. I assume that you are reading and processing one file at a time. Basically, you need to
1. Use unix sort to sort each file (maybe into a temp file) on characters 2..10 (on Windows, use GNU utils sort, they are native windows ports of unix utilities)
2. using Perl, read in each group of lines and process accordingly. Since the records are already grouped, you would only need to read in the # of lines in a group + 1 ( 80 * (# of lines + 1)). For better performance, you can read in each file in chunks to meet a specified memeory size and process each group in a loop.

Another alternative is to
1. read the file using Perl and writing each line to a unique id (pos 2..10) temporary files (maybe decoding pos 11..14 on the way).
2. sort each file on pos 11..14 and if necessary, cat them together to make a single file again. If you name the temp files properly, you can join the groups in any order you desire or need.

Of course, none of these options are "sexy" per se but given the file sizes you mentioned, the solutions shouldn't take more than a minute or two to run and they don't take much overhead. Hope this helps

PJ
unspoken but ever present -- use strict; use warnings; use diagnostics; (if needed)

Comment on Re: Reducing Memory Usage