I admit that I don't fully know your situation. However, I would think seriously about using a database if you don't need one. There is a lot of unrelated/unexpected overhead associated with a database. The costs in time, effort and learning curve can be high. That being said, if you need a database, generally, and believe this problem is a good one to convince your management to let you install one, then go for it.
On the other hand, this seems to me to be a simple text manipulation problem. You've had a couple of excellent, low footprint solutions posted already. Take another look at them. I assume that you are reading and processing one file at a time. Basically, you need to
1. Use unix sort to sort each file (maybe into a temp file) on characters 2..10 (on Windows, use GNU utils sort, they are native windows ports of unix utilities)
2. using Perl, read in each group of lines and process accordingly. Since the records are already grouped, you would only need to read in the # of lines in a group + 1 ( 80 * (# of lines + 1)). For better performance, you can read in each file in chunks to meet a specified memeory size and process each group in a loop.
Another alternative is to
1. read the file using Perl and writing each line to a unique id (pos 2..10) temporary files (maybe decoding pos 11..14 on the way).
2. sort each file on pos 11..14 and if necessary, cat them together to make a single file again. If you name the temp files properly, you can join the groups in any order you desire or need.
Of course, none of these options are "sexy" per se but given the file sizes you mentioned, the solutions shouldn't take more than a minute or two to run and they don't take much overhead. Hope this helps
PJ
unspoken but ever present -- use strict; use warnings; use diagnostics; (if needed)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.