Re^5: How to improve speed of reading big files

The point was that he was already implicitly using $_ in a couple of places, so why not go the whole hog and avoid making copies of every line.

The foreach scalar is a neat trick, but ultimately a do-once loop is more confusing to beginners than manipulating a global variable, which they tend to do naturally anyway before they learn better. There are many ways of improving the code in the CS sense, but the OP asked how to speed things up.

Avoiding unnecessary copying is one way. Unwrapping the sub back into the main loop is another. Avoiding the allocations involved in generating multiple long lists, and lists of anonymous arrays, is probably the most effective first step.

If you need optimise, then the only way is to profile properly and re-run the test after each change. Since we're only party to two subs, there's no way to know if this might be better done using a simple shell pipe chain. We also don't know whether each set of files is processed by once by a single set of query parameters; once each by several sets; frequently by many sets etc.

Under some scenarios, loading the logs directly into a DB and querying that makes most sense. Under others, the overhead of loading a DB would outweight the gains. Without the full picture you can only attempt to answer the question as asked.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP PCW It is as I've been saying!(Audio until 20090817)

Comment on Re^5: How to improve speed of reading big files Download Code