Re^3: Moving from hashing to tie-ing.

Limbic,

The files we receive are fixed length and many of the fields are not even used. A file may contain a dozen fields, but we may only need 2 or 3. Since these files are extracted from their database there is a lot of id matching to be done, which is mainly what the hashes do, such as $name_hash{'123'} = { first => 'John', last => 'Doe' };. Once all of the supporting files are hashed in this manner, a main script uses them as lookup tables. Therefore, every time it sees a record that has '123' in a certain field, it knows to use "John Doe" during the processing.

Having all of the data in memory is not necessary; however, I do not know how this could be done without using a database. Each record, from a few dozen to a few thousand, needs to use these supporting hashes.

Let me know if you need more information, I appreciate your help.

Comment on Re^3: Moving from hashing to tie-ing. Download Code

Replies are listed 'Best First'.
Re^4: Moving from hashing to tie-ing. by Limbic~Region (Chancellor) on Jul 31, 2006 at 16:24 UTC
eff_i_g, You really haven't said anything at all about how the program works or how it decides what data it needs and when. Since many of your fields are not needed, they need not be included in your data structure provided you can no in advance that they won't be needed. If only 1 id is ever worked with at a time, then there is not a need to ever load more than one record in memory at a time. Alternatively, it may be possible to employ a MRU cache such that the splits are cached in arrays but only a fixed number are cached where the most recently used stay in cache and others expire. Try to put yourself in my shoes. Read what you have written about your program, your datastructure, and your problem and see if you feel you have provided the necessary information to help. Again, we are just guessing. Cheers - L~R	[reply]
Re^5: Moving from hashing to tie-ing. by eff_i_g (Curate) on Jul 31, 2006 at 16:44 UTC
Limbic, I apologize; I'm trying :) This is a little challenging since I am also learning. The basic programming process is explained in my reply to BrowerUk. It's that simple, but it deals with a lot of information. The problem is with step 2 because it hashes all of the data provided, when the script may only need a fraction of it. To reiterate: Correct. The whole lookup is not needed for processing. The pins that are needed could be determined by reading all of the pins in the source file; the largest one is around 25MB, 40,000 lines. If the file was only using the pins 123, 456, and 789, I could only look for these in the other file to hash.	[reply]
Re^6: Moving from hashing to tie-ing. by Limbic~Region (Chancellor) on Aug 01, 2006 at 12:26 UTC
eff_i_g, I am afraid after reading your reply to BrowserUk, I am still left wondering about how the program works. You speak in terms as though we understand what you are talking about. What do you mean by section and how is it determined? This isn't really a question I want you to answer because I am sure it will just lead to more questions. I am afraid you just aren't providing the technical details necessary to help. I believe the only way that I personally am going to be able to help is if you were to provide a sample of the data (masking sensitive info is fine but it must be representative of the real data), the code that is processing it, and an example of how it is invoked. Cheers - L~R	[reply]