comment on

I agree with some of the posters above. In short form:

The best indication of how well the program will perform will be how well it's designed (optimise the design and algorithms before the code)
Profiling is very useful once the program is written. There are plenty of profilers for perl. I quite like Devel::SmallProf
There's the Benchmark module, and another way is to use Timer::HiRes to roll your own. Yet another way would be to set up a $SIG{"ALARM"} handler to set a variable called $finish_loop or whatever. Then call alarm to send the signal some number of seconds in the future, then run your tests in a until ($finish_loop) { # process some number of records, counting how many } loop. This can then give you a measure of the throughput of the code (ie, how many records it can handle per 10 seconds or per second, or whatever).

It seems that all of these really miss the point that your main concern is with finding a suitable storage mechanism. Using a full-blown database using the DBD module might be the best way to go if you're going to need complex queries and updates. If you're not familiar with SQL, you might want to try something like DBM::Deep. It's surprisingly easy to use if you're familiar with using Perl's hashes. The only two things to watch out for are locking files during startup, and writing more complex records/structures into/out of a table, which is handled fairly easily with YAML.

I think that you'll find that the most efficient solution here won't necessarily be the fastest in terms of how many records can be processed per second. Rather, it'll probably come down to how efficiently you can code up the program from requirements and how much time you have to spend debugging and maintaining it. That's really one great advantage of the SQL modules. You might have to put up a fair bit of up-front effort in learning to use it, but once you do, you can become very efficient at dealing with all sorts of data-intensive tasks like the one you've described. In fact, once you've written a few programs that use DBD, you'll probably never want to go back to working with flat files again except to populate the database and generate reports.

Also, while we're on the subject of profiling/benchmarks, remember the old maxim: "Premature optimisation is the root of all evil."

In reply to Re: Benchmarking Strategy? by dec
in thread Benchmarking Strategy? by pileofrogs

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.