Re: Parsing Large Text Files For Performance

dave_the_m's advice is spot on. I did have one thought though. If IP-1 and IP-2 are always presented in the same order (any line containing both IP's always presents the same one first), you could set special variable $/ to the string of IP-2. That way, instead of reading lines you'll be reading records ending with the 2nd critical IP address. Then all you have to do is scan said record to see if the 1st critical IP address appears after the nearest preceding newline character. If so, you've got a match.

Why would this be theoretically advantageous? It may (depending on how often IP-2 shows up) result in fewer iterations through the while loop. You're still reading the whole file, but only doing a regexp check if you already know that half of the condition has been met.

When reading a file there is an implicit check happening; behind the scenes perl looks for $/ to end each record. May as well use that implicit 'check' to your advantage.

Of course this adds additional complexity if you actually need to also capture info that comes after that 2nd IP address in the file. At that point, it would be difficult to guess as to whether the additional logic needed to handle that need would negate any minor advantage this path might have in the first place. I guess that means YMMV (Your mileage may vary).

Dave

Comment on Re: Parsing Large Text Files For Performance

Replies are listed 'Best First'.
Re^2: Parsing Large Text Files For Performance by Anonymous Monk on Jan 31, 2011 at 02:29 UTC
Thanks so much for the timely advice! I will do some testing and get back to you with the results.	[reply]