in reply to Benchmarking Strategy?
It seems that all of these really miss the point that your main concern is with finding a suitable storage mechanism. Using a full-blown database using the DBD module might be the best way to go if you're going to need complex queries and updates. If you're not familiar with SQL, you might want to try something like DBM::Deep. It's surprisingly easy to use if you're familiar with using Perl's hashes. The only two things to watch out for are locking files during startup, and writing more complex records/structures into/out of a table, which is handled fairly easily with YAML.
I think that you'll find that the most efficient solution here won't necessarily be the fastest in terms of how many records can be processed per second. Rather, it'll probably come down to how efficiently you can code up the program from requirements and how much time you have to spend debugging and maintaining it. That's really one great advantage of the SQL modules. You might have to put up a fair bit of up-front effort in learning to use it, but once you do, you can become very efficient at dealing with all sorts of data-intensive tasks like the one you've described. In fact, once you've written a few programs that use DBD, you'll probably never want to go back to working with flat files again except to populate the database and generate reports.
Also, while we're on the subject of profiling/benchmarks, remember the old maxim: "Premature optimisation is the root of all evil."
|
|---|