I want to speed up the process of retrieving information from files. In my code for each query I loop over files. For each file I retrieve data into array and process.
That's a bit vague. How badly is optimization needed, really? (Don't bother optimizing if you don't have to.) If you really need to optimize, how much of the problem is really i/o-bound, as opposed to cpu-bound? (You should try profiling the code to see where most of the run-time is spent: disk reads or processing loops?)
One way is to store all these files in array of arrays and then I guess it will do it much faster.If the amount of data in question fits easily into available ram, and if your code involves handling lots of queries on the same data in a single run, then obviously you will want to load all the data into memory at start-up, then process all the queries using the in-memory arrays (or hashes, or whatever), and that will be the easy way.My question is: is there an easy way to avoid reprogramming and use cache so that files retrieved one time from HD will be used in subsequent queries without retrieving them over and over from HD.
If the amount of data is uncomfortably large (e.g. won't fit in ram, or even if it does, it takes too long to load it all at start-up), you should consider using some sort of relational database, or DBM (hash) files for disk storage. Since you are talking about processing "queries", the most effective solution for a large and/or complicated data set is to do a suitable amount of indexing up front, and this is generally a simple matter of storing the data into indexed file structures (relational database tables or dbm hash files).
By using an existing RDB query engine or the appropriate flavor of tie %my_hash, ..., "my_hash_file", ... you get quite a lot of optimization for free -- both in terms of improving access speed when reading data from disk, and in terms of reducing the amount of processing that needs to be coded and executed in your script. (Look at AnyDBM_File for more info about hash files.)
In reply to Re: Caching files question
by graff
in thread Caching files question
by vit
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |