I want to speed up the process of retrieving information from files. In my code for each query I loop over files. For each file I retrieve data into array and process.

That's a bit vague. How badly is optimization needed, really? (Don't bother optimizing if you don't have to.) If you really need to optimize, how much of the problem is really i/o-bound, as opposed to cpu-bound? (You should try profiling the code to see where most of the run-time is spent: disk reads or processing loops?)

One way is to store all these files in array of arrays and then I guess it will do it much faster.

My question is: is there an easy way to avoid reprogramming and use cache so that files retrieved one time from HD will be used in subsequent queries without retrieving them over and over from HD.

If the amount of data in question fits easily into available ram, and if your code involves handling lots of queries on the same data in a single run, then obviously you will want to load all the data into memory at start-up, then process all the queries using the in-memory arrays (or hashes, or whatever), and that will be the easy way.

If the amount of data is uncomfortably large (e.g. won't fit in ram, or even if it does, it takes too long to load it all at start-up), you should consider using some sort of relational database, or DBM (hash) files for disk storage. Since you are talking about processing "queries", the most effective solution for a large and/or complicated data set is to do a suitable amount of indexing up front, and this is generally a simple matter of storing the data into indexed file structures (relational database tables or dbm hash files).

By using an existing RDB query engine or the appropriate flavor of  tie %my_hash, ..., "my_hash_file", ... you get quite a lot of optimization for free -- both in terms of improving access speed when reading data from disk, and in terms of reducing the amount of processing that needs to be coded and executed in your script. (Look at AnyDBM_File for more info about hash files.)


In reply to Re: Caching files question by graff
in thread Caching files question by vit

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.