in reply to Perl Script performance issue

If the content of the $data_location file(s) is not too large for the available memory, then the obvious cure would be to read its content only once and to load it into memory in an appropriate data structure (probably a hash table of some form or other), and then to read source file line by line and to lookup in memory for the additional data pieces required.

Such a process can often be several orders of magnitude faster (i.e. hundreds or even thousands times faster, perhaps even more). But this is feasible only of the data in $data_location (or the part of it that you actually use) is not too large to fit in memory.

Therefore the question asked by poj about the size of your data and about the current timings is really crucial.

If the data is too large to fit into memory, there is still the possibility of storing the data in $data_location into a database to enable indexed access to the piece of data the you need. The performance gain would be much smaller, but can still be very significant and might be sufficient for your purpose.

Replies are listed 'Best First'.
Re^2: Perl Script performance issue
by Tara (Initiate) on Dec 16, 2015 at 07:39 UTC

    The data files are large. Initially i did consider storing them in memory, but discarded the option due to large size of the files. Account file is 182777579 byte, which varies daily, but remains more or less of the same size. Currently holds 62394 records.

      ACCT NUMBER read main file, second column in main file is primary key for looking up positionfile.delim.

      If value is held in 6th column, which column in positionfile.delim is the primary key. Is is column 1 ?

      Which column in main file are these held, they can't all be the second column, or am I missing something ?

      ACCT NUMBER|positionfile.delim|2|6 PO TYPE|positionfile.delim|2|3 LOC CODE|positionfile.delim|2|47
      poj

        primary key can be same for many fields. For example: here second column would be acct number 1234, so in order to fetch details for that acct only, i am grepping that acct number (which returns single record) and then select columns as specified. Here
        LOC CODE|positionfile.delim|2|47
        LOC NAME|locationfile.delim|47|4
        For loc code, use second column from main file as primary key for looking up positionfile.delim, get 47th field.
        For loc name, 47th column from main file is primary key for looking up locationfile.delim, get the 4th field.

      The data files are large. Initially i did consider storing them in memory, but discarded the option due to large size of the files. Account file is 1827775ely tiv79 byte, which varies daily, but remains more or less of the same size. Currently holds 62394 records.
      This is indeed relatively large, but most probably small enough to fit into memory on a decently modern computer. That's what I would try anyway. Especially if you can decide to store in memory only the part of these files which is useful for your process.