Optimizing string searches

cool256 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Optimizing string searches by perrin (Chancellor) on Sep 05, 2008 at 16:38 UTC
Use a sliding window and if you're looking for constants use index() instead of a regex.	[reply]
Re: Optimizing string searches by moritz (Cardinal) on Sep 05, 2008 at 16:24 UTC
If the strings you are looking for are constants and start similarly, try with perl 5.10.0, it optimizes the heck out of constant alternations. But probably one of the `grep` unix (or GNU) tools is faster.	[reply] [d/l]
Re: Optimizing string searches by Illuminatus (Curate) on Sep 05, 2008 at 17:53 UTC
I concur on grep. Most have an option to accept perl REs, so you wouldn't even have to do any mods. However, your vague description seems to indicate GBs worth of text to search. You don't mention how often it has to run. If grep, for some reason, is not an answer, I would first baseline 'slow'. Write a small program that just reads in all the lines of all the files you need to process. You obviously aren't going to get any faster than that, using perl. I assume these files are actually being created on many different machines. Can you add a small program to each that processes each log file as it is created (ie spread the pain)? If Linux, just 'tail -f' the log file and pipe it to your parser.	[reply]
Re: Optimizing string searches by Anonymous Monk on Sep 05, 2008 at 16:06 UTC
"grep -l"?	[reply]
Re: Optimizing string searches by johndageek (Hermit) on Sep 05, 2008 at 18:50 UTC
Rather vague description but perl may be better than os grep. if files reside on multiple machines - run search on the seperate machines if possible. if files reside on a single machine - process them locally. do not open files across the network. suggest: create a list of log files loop open log files loop read current file regex string1 (if match write ouput) regex string2 (if match ...) regex string3 (if match ...) regex string4 (if match ...) next record next log file assumes: will only parse the log files for these 4 strings. there will be no reason to search the same logfiles again for other strings. Enjoy! Dageek	[reply]
Re^2: Optimizing string searches by moritz (Cardinal) on Sep 05, 2008 at 18:58 UTC
`regex string1 (if match write ouput) regex string2 (if match ...) regex string3 (if match ...) regex string4 (if match ...)` [download] It's usually faster to build one regex with four alternations and match that instead of matching four single regexes against a string.	[reply] [d/l]
Re^3: Optimizing string searches by johndageek (Hermit) on Sep 11, 2008 at 13:46 UTC
Thanks Moritz! Enjoy! Dageek	[reply]
Re^2: Optimizing string searches by cool256 (Initiate) on Sep 05, 2008 at 20:54 UTC
Thanks for all the suggestions. Indeed my question was a bit vague. Since the search strings may change at any given time, hardcoding the regex was not an option. Instead I generate a runtime perl file containing the regex on the fly from the search strings. This boosted the performance and I'm fairly happy with the results. Thanks again :)	[reply]
Re: Optimizing string searches by holli (Abbot) on Sep 05, 2008 at 21:24 UTC
go and buy a faster harddisk. holli, /regexed monk/	[reply]