in reply to is this the most efficient way to double-parse a large file?

How big is your "large logfile"? If it's less than 1/2 the memory in your computer then simply keep all the client data in a hash, only parse the file once and print the report using the triggered client's data at the end.

If the file is too big for the memory based approach and memory size > (20 * clients * client entries) then hdb's solution should be fine, otherwise either a database as suggested by hdb, or parse the logfile once, but write each client's data out to its own file in the report format you need then process the triggered client's data after parsing your logfile. Note that you need to take care not to open too may file handles with this last approach!

True laziness is hard work
  • Comment on Re: is this the most efficient way to double-parse a large file?

Replies are listed 'Best First'.
Re^2: is this the most efficient way to double-parse a large file?
by jasonl (Acolyte) on Jan 21, 2014 at 16:21 UTC

    By today's standards it's probably not excessively large, +/- 100MB each (although there could be cases where multiple files will be catenetated before processing). I was worried that a single hash with everything in it would be too large, but if 1/2 available memory is the rule of thumb I should be good. A DB is definitely overkill, as each dataset will likely only be processed once or twice and then discarded.

    Thanks.

      1/2 could be almost any number. The reply was more to shake up your thinking a little to make you think more in terms of "let's try the simple way first". Remember: premature optimisation is the root of all evil.

      The important rule of thumb is: "If the code changes take longer than the run time saved, it's fast enough already".

      If the code changes take longer than the run time saved, it's fast enough already.