Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, The snippet of code below is taking some time in doing what it does (computes min, max and means of several arrays). Is there a way to speed up the process? Thanks, Stacy. foreach $checked (@avail) { chomp($checked); if ( $checked =~ /\.gz$/ ) { open(LOG, "/usr/local/gnu/bin/gunzip -c < $checked |") or die "$checked open failed: $!\n"; } else { open(LOG, "<$checked") or die "$checked open failed: $!\n"; } #reset data structures $MaxWSpd = $AvgWSpd = $WindDir = $OutTemp = 0; @maxwspd = (); @avgwspd = (); @outtemp = (); @winddir = (); while ($line = <LOG>) { %months = qw{JAN 1 FEB 2 MAR 3 APR 4 MAY 5 JUN 6 JUL 7 AUG 8 SEP 9 OCT 10 NOV 11 DEC 12}; next if $line =~ /^\D/; #Skip header next if $line =~ /9999/; #Skip bad records ($Date,$MaxWSpd,$AvgWSpd,$WindDir,$OutTemp) = (split(" ",$line)) +[0,2,3,4,8]; next if $MaxWSpd < 0; next if $AvgWSpd < 0; next if $WindDir < 0; next if $OutTemp < 0; ($day,$month,$year) = split('-',$Date); $month = sprintf "%02d", $months{$month}; if ($WindDir > 360) { $WindDir = $WindDir%360 } push (@maxwspd,$MaxWSpd); push (@avgwspd,$AvgWSpd); push (@outtemp,$OutTemp); push (@winddir,$WindDir); } #end of while $MeanAvg = sprintf("%4.1f", mean(\@avgwspd)); $MeanWDir = sprintf("%4.1f", mean(\@winddir)); $MaxWSpd = sprintf("%4.1f", max(\@maxwspd)); $MinWSpd = sprintf("%4.1f", min(\@maxwspd)); $MaxOutTemp = sprintf("%4.1f", max(\@outtemp)); $MinOutTemp = sprintf("%4.1f", min(\@outtemp)); push(@stats, "$year/$month/$day: $MinWSpd:$MaxWSpd $MeanAvg $MeanWDir $MinOutT +emp:$MaxOutTemp"); close(LOG); } #end of foreach foreach (@stats) { print $_, "\n"; } #-------------------- sub mean { my ($arrayref) = @_; my $result; foreach (@$arrayref) { $result += $_; } return $result / @$arrayref; } #-------------------- sub max { my ($arrayref) = @_; $max = -10; foreach (@$arrayref) { if ($_ > $max) { $max = $_ } } return $max; } #-------------------- sub min { my ($arrayref) = @_; $min = 50; foreach (@$arrayref) { if ($_ < $min) { $min = $_ } } return $min; } #--------------------

Replies are listed 'Best First'.
Re: Inefficient code?
by Masem (Monsignor) on Apr 22, 2001 at 15:18 UTC

    Pretty much, for each array you have, you are running through each array 4 times; once to read in, once for mean, once for max, and once for min. While these are all O(n) times, as n grows, so will the time to execute this program.

    You should be able to build up the necessary maths for mean, max, and min at the same time you read in the data. You'll need some variables to do this (I'd suggest a hash for each data type you are reading in). And, if you do this right, you need not even store the data, just process the numbers for the current line and drop them, helping you to save memory.


    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
(ar0n: use List::Util) Re: Inefficient code?
by ar0n (Priest) on Apr 22, 2001 at 18:34 UTC
    The snippet of code below is taking some time in doing what it does (computes min, max and means of several arrays). Is there a way to speed up the process?
    List::Util is a module written in XS, which does the same as your subroutines do (min, max). Because it's written in XS, it's definitely faster than what you have now.

    There's no mean function, but that can be worked around by simply doing:
    my $mean = sum(@list) / @list;

    update: I just looked into Statistics::Descriptive, and it seems a little too much for what you want. If all you want is the minimum, maximum and the mean, I'd use List::Util

    ar0n ]

Re: Inefficient code?
by Albannach (Monsignor) on Apr 22, 2001 at 16:45 UTC
    I think you will find that Statistics::Descriptive will be quite handy for your application, and implementing it to collect the data as you read it as Masem suggests is very simple.

    Update: ar0n has a good point - I tend to want more, uh, descriptive statistics I guess ;-)

    --
    I'd like to be able to assign to an luser

Re: Inefficient code?
by RhetTbull (Curate) on Apr 23, 2001 at 05:22 UTC
    I don't know if it's faster than opening a pipe from gunzip like you do, but any time I need to read .gz files I use Compress::Zlib. Since it's an actual module, it may be faster than opening the pipe (especially if called repeatedly) since that probably involves spawning a new process. Compress::Zlib is easy to use and lets you read directly from the gzipped files. Here's a simple gzcat example taken directly from the Compress::Zlib docs:

    use strict ; use warnings ; use Compress::Zlib ; die "Usage: gzcat file...\n" unless @ARGV ; my $file ; foreach $file (@ARGV) { my $buffer ; my $gz = gzopen($file, "rb") or die "Cannot open $file: $gzerrno\n" ; print $buffer while $gz->gzread($buffer) > 0 ; die "Error reading from $file: $gzerrno" . ($gzerrno+0) . "\n" + if $gzerrno != Z_STREAM_END ; $gz->gzclose() ; }
      Thanks all for your suggestions. By using Benchmark and Compress:Zlib on 10 files (each about 47K each), I found the following:
      pre compress::zlib: the code took:17 wallclock secs (11.76 usr 2.52 sys + 0.25 cusr 0.2 +5 csys = 14.78 CPU) with compress:zlib: the code took:15 wallclock secs ( 9.71 usr 2.60 sys + 0.02 cusr 0.0 +6 csys = 12.39 CPU) Not much of a difference, but for more files, I found the saving in time up to 10 seconds. Thanks once again! Regards, Stacy.