eff_i_g,
Ok, this still doesn't answer my questions but I do have what I believe to be a half-decent suggestion for you. You do not indicate how often you are provided these dumps from the customer or how many "runs" are done on the data in between new dump files. Assuming the dumps arrive no more then once a day and that the number of "runs" in between new dumps is more than a few - the following methodology should improve the efficiency of the existing code with only minor modifications:
First, create a pre-process script that parses the huge source file and supporting data file one time. Its job is to index the position of each ID in the file. This information should be stored in a database (DBD::SQLite or some such) or in a serialized datastructure (Storable or some such). What this buys you is the ability to, given an ID - open the 2 files and quickly read in just the record associated with that ID. No searching required and no parsing of non-related IDs necessary.
Second, make a minor modification to the current script that uses the pre-processed index to pull in just the record(s) associated with that ID. Now you can create as complex a datastructure as makes sense and need not constantly re-split.
This ultimately is not what I would like to suggest but given the lack of details it is the best I can offer.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.