profiling help

permanentE has asked for the wisdom of the Perl Monks concerning the following question:

I need help profiling this code, it needs to be quicker.
I need to capture the word that follows the first occurence of the string "hostname" in several thousand files. At first I tried a system call to grep, since grep is a C binary I thought it would be quickest, but apparently the system call overhead makes it slower. This is what I have:

@cache = `cat filelist`;

foreach $path (@cache) {
        open F, $path;
        while (<F>) {
                if (/hostname ([\-\w]+)/) {
                        $hostname = $1;
                        last;
                }
        }
        print "$hostname\n";
}
[download]

Here is the Devel::SmallProf output

            ================ SmallProf version 0.9 ================
                                Profile of prof                       
+ Page 1
       ===============================================================
+==
    count wall tm  cpu time line
        0 0.000000 0.000000     1:#!/opt/CSCOpx/bin/perl
        0 0.000000 0.000000     2:
        1 0.000004 0.000000     3:$RUNDIR = "/var/adm/links/newrun";
        0 0.000000 0.000000     4:
        0 0.000000 0.000000     5:
        1 0.013885 0.010000     6:@cache = `cat $RUNDIR/filelist`;
        0 0.000000 0.000000     7:
    11600 0.059747 0.050000     8:foreach $path (@cache) {
    11599 0.906185 0.080000     9: open F, $path;
   284014 3.008497 1.210000    10: while (<F>) {
   283652 3.195402 0.840000    11:  if (/hostname ([-\w]+)/) {
    11599 0.101677 0.060000    12:   $hostname = $1;
    11599 0.094969 0.040000    13:   last;
        0 0.000000 0.000000    14:  }
        0 0.000000 0.000000    15: }
    11599 0.127013 0.050000    16: print $hostname." ";
        0 0.000000 0.000000    17:}
[download]

Comment on profiling help Select or Download Code

Replies are listed 'Best First'.
Re: profiling help by BrowserUk (Patriarch) on Apr 15, 2003 at 23:18 UTC
If your files are all of a reasonable size, upto a few 10's of megs for example-- then rather than reading each file line by line, you could probably improve the performance by slurping the whole file (see perlvar:$/). If some or all of you files are to big to read into memory in one go, then you could try using the sliding buffer technique I posted at Re: speed up one-line "sort\|uniq -c" perl code which reads the file in specifiable large chunks and takes care of ensuring that it starts each new search from a newline each time so as not to miss matches that might get split across reads. This seems to have a fairly substancial performance benefit over reading line-by-line at the cost of a little extra complexity. Examine what is said, not who speaks. 1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. 2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible 3) Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke.	[reply]
Re: profiling help by l2kashe (Deacon) on Apr 16, 2003 at 00:42 UTC
A micro op that may save you time is to anchor your regex.. Which may or may not be possible.. I was thinking something along the line of say `.. snip .. if (/^hostname ([-\w]+) /) or if (/^(?:\s+\|)hostname ([-\w]+) /)` [download] or even better if its on a line all by itself as you can at either end of the string.. I know its not why you asked, but you arent testing to see whether or not your open was successful.. # old line open F, $path; # new line IMHO open(F, "$path") \|\| die "Cant open $path: $!\n"; # or even open(F, "$path") \|\| ( print "Cant open $path: $!\n" && next); # another small trim would be to do away with the interim # array and filename.. $filelist = "/var/adm/links/newrun/filelist"; for ( `cat $filelist` ) { chomp; open(F, "$_") \|\| die "Cant open $_: $!\n"; while (<F>) { # anchor could apply here as well.. !/hostname/ ? next : chomp; # work on hostname lines here last; } close(F); print $hostname. " "; } [download] Just some rambling thoughts on the code.. MMMMM... Chocolaty Perl Goodness.....	[reply] [d/l] [select]
Re: profiling help by traveler (Parson) on Apr 15, 2003 at 23:21 UTC
If the files are not large, you could test your code with putting `undef $/;` as line 4. This causes the read to read in the entire file at once. It may or may not help depending on the data. HTH, --traveler	[reply] [d/l]