in reply to Benchmarking Strategy?

I agree with some of the posters above. In short form:
  1. The best indication of how well the program will perform will be how well it's designed (optimise the design and algorithms before the code)
  2. Profiling is very useful once the program is written. There are plenty of profilers for perl. I quite like Devel::SmallProf
  3. There's the Benchmark module, and another way is to use Timer::HiRes to roll your own. Yet another way would be to set up a $SIG{"ALARM"} handler to set a variable called $finish_loop or whatever. Then call alarm to send the signal some number of seconds in the future, then run your tests in a until ($finish_loop) { # process some number of records, counting how many } loop. This can then give you a measure of the throughput of the code (ie, how many records it can handle per 10 seconds or per second, or whatever).

It seems that all of these really miss the point that your main concern is with finding a suitable storage mechanism. Using a full-blown database using the DBD module might be the best way to go if you're going to need complex queries and updates. If you're not familiar with SQL, you might want to try something like DBM::Deep. It's surprisingly easy to use if you're familiar with using Perl's hashes. The only two things to watch out for are locking files during startup, and writing more complex records/structures into/out of a table, which is handled fairly easily with YAML.

I think that you'll find that the most efficient solution here won't necessarily be the fastest in terms of how many records can be processed per second. Rather, it'll probably come down to how efficiently you can code up the program from requirements and how much time you have to spend debugging and maintaining it. That's really one great advantage of the SQL modules. You might have to put up a fair bit of up-front effort in learning to use it, but once you do, you can become very efficient at dealing with all sorts of data-intensive tasks like the one you've described. In fact, once you've written a few programs that use DBD, you'll probably never want to go back to working with flat files again except to populate the database and generate reports.

Also, while we're on the subject of profiling/benchmarks, remember the old maxim: "Premature optimisation is the root of all evil."