permanentE has asked for the wisdom of the Perl Monks concerning the following question:

I need help profiling this code, it needs to be quicker.
I need to capture the word that follows the first occurence of the string "hostname" in several thousand files. At first I tried a system call to grep, since grep is a C binary I thought it would be quickest, but apparently the system call overhead makes it slower. This is what I have:
@cache = `cat filelist`; foreach $path (@cache) { open F, $path; while (<F>) { if (/hostname ([\-\w]+)/) { $hostname = $1; last; } } print "$hostname\n"; }
Here is the Devel::SmallProf output
================ SmallProf version 0.9 ================ Profile of prof + Page 1 =============================================================== +== count wall tm cpu time line 0 0.000000 0.000000 1:#!/opt/CSCOpx/bin/perl 0 0.000000 0.000000 2: 1 0.000004 0.000000 3:$RUNDIR = "/var/adm/links/newrun"; 0 0.000000 0.000000 4: 0 0.000000 0.000000 5: 1 0.013885 0.010000 6:@cache = `cat $RUNDIR/filelist`; 0 0.000000 0.000000 7: 11600 0.059747 0.050000 8:foreach $path (@cache) { 11599 0.906185 0.080000 9: open F, $path; 284014 3.008497 1.210000 10: while (<F>) { 283652 3.195402 0.840000 11: if (/hostname ([-\w]+)/) { 11599 0.101677 0.060000 12: $hostname = $1; 11599 0.094969 0.040000 13: last; 0 0.000000 0.000000 14: } 0 0.000000 0.000000 15: } 11599 0.127013 0.050000 16: print $hostname." "; 0 0.000000 0.000000 17:}

Replies are listed 'Best First'.
Re: profiling help
by BrowserUk (Patriarch) on Apr 15, 2003 at 23:18 UTC

    If your files are all of a reasonable size, upto a few 10's of megs for example-- then rather than reading each file line by line, you could probably improve the performance by slurping the whole file (see perlvar:$/).

    If some or all of you files are to big to read into memory in one go, then you could try using the sliding buffer technique I posted at Re: speed up one-line "sort|uniq -c" perl code which reads the file in specifiable large chunks and takes care of ensuring that it starts each new search from a newline each time so as not to miss matches that might get split across reads. This seems to have a fairly substancial performance benefit over reading line-by-line at the cost of a little extra complexity.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: profiling help
by l2kashe (Deacon) on Apr 16, 2003 at 00:42 UTC
    A micro op that may save you time is to anchor your regex.. Which may or may not be possible.. I was thinking something along the line of say
    .. snip .. if (/^hostname ([-\w]+) /) or if (/^(?:\s+|)hostname ([-\w]+) /)
    or even better if its on a line all by itself as you can at either end of the string..

    I know its not why you asked, but you arent testing to see whether or not your open was successful..
    # old line open F, $path; # new line IMHO open(F, "$path") || die "Cant open $path: $!\n"; # or even open(F, "$path") || ( print "Cant open $path: $!\n" && next); # another small trim would be to do away with the interim # array and filename.. $filelist = "/var/adm/links/newrun/filelist"; for ( `cat $filelist` ) { chomp; open(F, "$_") || die "Cant open $_: $!\n"; while (<F>) { # anchor could apply here as well.. !/hostname/ ? next : chomp; # work on hostname lines here last; } close(F); print $hostname. " "; }
    Just some rambling thoughts on the code..

    MMMMM... Chocolaty Perl Goodness.....
Re: profiling help
by traveler (Parson) on Apr 15, 2003 at 23:21 UTC
    If the files are not large, you could test your code with putting undef $/; as line 4. This causes the read to read in the entire file at once. It may or may not help depending on the data.

    HTH, --traveler