in reply to Re^2: Efficient way to handle huge number of records?
in thread Efficient way to handle huge number of records?
Wait, is the program called BLAST something you already have or is that the script you want to create? I.e. why would you use BLAST to search and then use a different script for retrieval? Obviously when the retrieval script creates that output file the number of hits is available with zero effort
If you want to use your script with a webserver you probably will observe the following with the different solutions:
A) perl CGI script, loading hash in memory: Really bad, every search has to load the database into memory. Even if that takes only 4 seconds it is mayor disk wear and slow. Same with Storable
B) Using mod_perl, loading hash into memory: With a trick (using a BEGIN block to load the hash) you could read the data only once into memory. Big improvement on execution speed, but memory could be a problem as apache usually creates multiple instances of the web server each running the script and data. Depends on the server engine you use, a threaded engine probably could share the data but not a forking engine (Me no expert here, someone hopefully will correct me if I'm wrong). Same with Storable
C) Using SQLite, DBM::Deep, ...: Disk based methods. Data is not loaded into memory, an index keeps retrieval reasonably fast. Obviously somewhat slower than a simple hash in memory, but still simple to program. No problem if your database grows in size (within limits), no noticable startup time. If the database does mostly reads and seldom writes arguably the best solution in your case IMHO
D) Using mysql or postgresql: Apart from the web server you have to install and manage another server (easy on linux (easy for me, not at all easy for my mother ;-) ), not so easy on windows(?)). Reading and changing data in the database never is a performance problem irrespective of size. Complex databases possible (which you don't seem to need), the database server will get bored to death in your case ;-). Probably a notch slower than solution C, but the only solution if you want 100 simultaneous apache threads reading and writing to that database.
|
|---|