Re: Break up weblogs

In your first example you are scanning the Apache log in its entirety for each department in your list. That's 40 passes.... Fletch's solution will reduce that to a single pass, it **will** be faster.

Your second solution got the log procesesing down to a single pass and then uses memory to hold the data. For "small" logs this will work, but you will run out of memory as the log gets larger. The real solution, as Fletch pointed out, is to write the data once you have determined where it should go into an extract file (one per department). The code Fletch proposes will scale nicely, as you add more departments (another bonus).

As to the amount of time it takes, you said "I have a very large Apache log...". There is a basic Principle of Science to bear in mind here:

TTT -- Things Take Time.

----
I Go Back to Sleep, Now.

OGB

Comment on Re: Break up weblogs