Use a sliding window and if you're looking for constants use index() instead of a regex. | [reply] |
| [reply] [d/l] |
I concur on grep. Most have an option to accept perl REs, so you wouldn't even have to do any mods. However, your vague description seems to indicate GBs worth of text to search. You don't mention how often it has to run. If grep, for some reason, is not an answer, I would first baseline 'slow'. Write a small program that just reads in all the lines of all the files you need to process. You obviously aren't going to get any faster than that, using perl.
I assume these files are actually being created on many different machines. Can you add a small program to each that processes each log file as it is created (ie spread the pain)? If Linux, just 'tail -f' the log file and pipe it to your parser.
| [reply] |
Rather vague description but perl may be better than os grep.
if files reside on multiple machines - run search on the seperate machines if possible.
if files reside on a single machine - process them locally.
do not open files across the network.
suggest:
create a list of log files
loop open log files
loop read current file
regex string1 (if match write ouput)
regex string2 (if match ...)
regex string3 (if match ...)
regex string4 (if match ...)
next record
next log file
assumes: will only parse the log files for these 4 strings. there will be no reason to search the same logfiles again for other strings.
| [reply] |
| [reply] [d/l] |
| [reply] |
Thanks for all the suggestions. Indeed my question was a bit vague. Since the search strings may change at any given time, hardcoding the regex was not an option. Instead I generate a runtime perl file containing the regex on the fly from the search strings. This boosted the performance and I'm fairly happy with the results.
Thanks again :)
| [reply] |
go and buy a faster harddisk.
| [reply] |