in reply to I need speed

Here are some tips if you don't want to use external tools. The key is to be lazy and defer as much work as possible. One simple thing you can do is
my ($filename, $rest) = split /,/, $_, 2; if (index($filename, $to_find) > -1) { my ($cms, $path, $size, $day, $time) = split /,/, $rest;
Going quite a lot further:
open(F, "+< $infile"); while (sysread F, $_, 32768) { $_ .= <F>; next unless /\Q$to_find\E/; # quickskip for(grep /^[^,]*?\Q$to_find\E/, split /\n/, $_) { ($filename, $cms, $path, $size, $day, $time) = split /,/; $href = "file:\\\\netd\\data".$path."\\$filename"; $href =~ s/\s/%20/g; $table .= "<TR><TD><A HREF=\"$href\">$path\\$filename</A><TD>$si +ze<TD>$day $tim +e</TR>"; } } close(F);
This greatly reduces the number of IO operations and restricts the heavy splittage to known matches.

I believe the following is an extra win, but I haven't benchmarked it.
open(F, "+< $infile"); while (sysread F, $_, 32768) { $_ .= <F>; next unless /\Q$to_find\E/; # quickskip while(/^([^\n,]*?\Q$to_find\E[^\n]*)/gm) { ($filename, $cms, $path, $size, $day, $time) = split /,/, $1; $href = "file:\\\\netd\\data".$path."\\$filename"; $href =~ s/\s/%20/g; $table .= "<TR><TD><A HREF=\"$href\">$path\\$filename</A><TD>$si +ze<TD>$day $tim +e</TR>"; } } close(F);
Beyond these tricks, you quickly get to the point of pretty much handrolling your own database engine..

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: I need speed
by Galen (Beadle) on Jun 21, 2002 at 20:37 UTC
    Thanks for your reply - I had thought about limiting the split, but I didn't believe it would enhance performance much. It does appear to help though. The simple reduced split you noted seems to improve performance by about 20%. However, the other snippets you provided don't work. I'm trying to figure out why. They return no matches for files I know exist.
      Odd. Admitted, I didn't test them except for the regex in the last snippet when I posted them, but I did now and they seem to be working. *shrug* There's nothing obvious I can spot either.

      Makeshifts last the longest.

        Here is a line from the text file:

        "color.html","NDC","\Reports\NDC Reports",10137,5/29/2002,9:43:42 PM

        If you were to look for the file "color.html", your regex would turn up nothing. I'll play with it a little more this week.