tcf03 has asked for the wisdom of the Perl Monks concerning the following question:

I guess this question somewhat relates to a question I had yesterday ==> node 446709 <==
I have several large files (about 20 of them) each between 100 and 220 mb in size. The beginning of each line of these files conatains the following. (WkDay Month Day Time Year LogInfo)

Thu Apr 7 03:00:38:81 2005 rest of line is logging info

I am currently searching the files like this:
for $_(sort @voctable) { open (TMPFILE, "$workdir$_") or die("unable to open $_: $!\n"),br; print "[<font color=\"#ff0000\" size=\"+1\">FILE=$_</font>]\n",br; print hr; while (<TMPFILE>) { print ("<b>$1</b> $2\n",br) if ( m/(.*$MONTH.*$DAY.*$HOUR:$MIN +UTE:\d\d:\d\d\s+20\d\d)(.*)/g ) } close (TMPFILE); }
There is one of these loops for each type of file. There are 5 different types of files. The point of all of this is to grab all lines from like date and time and put them onto one web page, so our application people can have more of an overall view of all logs from the same time period. Currently the process to iterate over each type of file (5), and each type of file can have four or five logs in its history, can take up to 20 minutes. Is there a faster way to do this?

Thanks in advance
Ted

UPDATE
Thanks! Changing the regex did speed things up quite a bit, The regex I am trying now that seems to be a bit speedier is:
print ("<b>$1</b> $2\n",br) if ( m/(^....$MONTH..$DAY.$HOUR:$MINUTE:.. +:...20..)(.*)/ )

Replies are listed 'Best First'.
Re: Fast file searching
by BrowserUk (Patriarch) on Apr 12, 2005 at 13:35 UTC

    Your regex is way too loose and time consuming.

    First, why are you using the /g option?

    Second, if the timestamp is always at the start of the line and of a consistant format, using a regex that enshrines that information, and doesn't force the regex engine to check things that are unnecessary, will probably speed things up. Something like:

    m/^... $MONTH $DAY $HOUR:$MINUTE:..:.. 20..)/

    May help.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?

      Or if your timestamps are "fixed" enough you may be able to use some combination of unpack, substr, and split instead of a regex.

        Recent enhancements to pack/unpack format and the fact that the format has to be interpreted every time where a regex, at some level is, 'compiled' the first time it is used, mean that pack/unpack are often slower these days.

        If you use split you are using the regex engine anyway and if you need to use multiple calls to substr, the regex engine will nearly always win.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco.
        Rule 1 has a caveat! -- Who broke the cabal?
Re: Fast file searching
by dragonchild (Archbishop) on Apr 12, 2005 at 13:53 UTC
    If these files are static (you have a logfile for each day, etc), then you can use an offline process to prepare an HTML version of each logfile. Then, just serve the static HTML.