Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Performance revision needed

by Anonymous Monk
on Jan 02, 2003 at 16:07 UTC ( [id://223783]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have created a script that manipulates text files around 80 meg in size (1.5 million lines approximately), and the following block of code seems to be a bottleneck. The current code takes around 19 minutes to execute on a sun box with a single 450 mhz processor (64 bit)

while (<SSLOG>) { chomp (($date,$atime,$etime,$mth,$port,$attrID,$value)=split(/;/,$ +_)); # Convert the log time to UTC time ($HH, $MM, $SS)=split(/:/,$etime); ($year, $month, $day) = $date =~ m/^(.{4})(.{2})(.*)/; $month -= 1; $Time = timelocal($SS, $MM, $HH, $day, $month, $year); # Collect non-zero data if ($Conf[7] =~ /NO/i) { if ($value != /^0/) { # Build a hash of MTypeHandles with their unique ports $Hash{$mth}{$port}++; my @entry = (); $attrID =~ s/\s*//g; $port =~ s/\s*//g; if ($port =~ /\-/) { $port =~ s/\-//; } $key = $attrID; $val = $DS_Info{$key}; $val = join("", $val, $port); @entry = ($Time, $mth, $val, $value); # Populate the data array push @Data, [@entry]; } } # Collect zero data elsif($Conf[7] =~ /YES/i) { # Build a hash of MTypeHandles with their unique ports $Hash{$mth}{$port}++; my @entry = (); $attrID =~ s/\s*//g; $port =~ s/\s*//g; if ($port =~ /\-/) { $port =~ s/\-//; } $key = $attrID; $val = $DS_Info{$key}; $val = join("", $val, $port); @entry = ($Time, $mth, $val, $value); # Populate the data array push @Data, [@entry]; } }
Any suggestions for improving performance?

Thanks

Replies are listed 'Best First'.
Re: Performance revision needed
by pfaut (Priest) on Jan 02, 2003 at 16:32 UTC
    if ($Conf[7] =~ /NO/i) { ... elsif($Conf[7] =~ /YES/i) {

    @Conf appears to be setup outside of this loop. Evaluate this once and save the result instead of parsing a regex every time through the loop.

    $attrID =~ s/\s*//g; $port =~ s/\s*//g;

    I think replacing the '*' with '+' might help here.

    $val = join("", $val, $port);

    How about $val .= $port?

    push @Data, [@entry];

    Since @entry is a lexical variable, you should be able to use \@entry instead of [@entry] which needs to make a copy of the list.

    Unless I'm missing something, both blocks of code do the same thing. It might be better from a maintenance standpoint to check if either condition is true and only have the block once.

    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';
Re: Performance revision needed
by gjb (Vicar) on Jan 02, 2003 at 16:37 UTC

    Just a few things that come to mind:

    • if ($port =~ /\-/) {...} can be replaced by $port =~ s/\-//g;
    • you can simply push @Data, [$Time, $mth, $val, $value];
    • It may be faster to do ($year, $month, $day) = (substr($date, 0, 4), substr($date, 4, 2), substr($date, 6)); to split the date
    • if $Conf[7] seems to be set outside the while, so you could go for a simpler if statement rather than do the same regex each 1.5 million times through the loop or even forego the conditional statement in the loop completely
    • you might have a look at the various Date and Time modules on CPAN that might do a faster job on date/time conversions

    Hope this helps, -gjb-

      great ideas...
      you wouldn't happen to have a suggestion for some revised conditional statements would you...I've been staring at this code too long....

      Thanks again

Re: Performance revision needed
by talexb (Chancellor) on Jan 02, 2003 at 16:47 UTC

    As I understand ir, using the 'i' modifier in Perl slows down the regexp code. You might benchmark using /YES/ || /yes/ instead.

    You appear to be using the s operator to remove any spaces. My guess is that you can replace that with tr/ //d instead, and get a speed improvement.

    Finally, it looks like the two blocks of code are identical. Can you instead check for (NO and $value not zero) or YES? to do that block of code?

    --t. alex
    Life is short: get busy!
Re: Performance revision needed
by rir (Vicar) on Jan 02, 2003 at 17:41 UTC
    I would not call the following optimization. You don't really give enough info to optimize your code. The following just eliminates some wasted cycles and stylistically improves the code. This should improve performance also.

    This is untested off the cuff code

    #!/usr/bin/perl use warnings; use strict; # move declarations out of loop # my ( $date, $atime, $etime, $mth, $port, $attrID, $value ); my ( $HH, $MM, $SS ); my ( $year, $month, $day ); my @entry ; while (<SSLOG>) { # chomp one item not seven chomp; ($date, $atime, $etime, $mth, $port, $attrID, $value) = split /;/, $_ ; # combine conditions into one # could be optimized by knowing the data # and short circuiting # if ( ($Conf[7]=~ /NO/i) && ($value!= /^0/) or ( $Conf[7] =~ /YES/i) ) { # Move time computation inside conditional # # Convert the log time to UTC time ( $HH, $MM, $SS ) = split ( /:/, $etime ); ( $year, $month, $day ) = $date =~ m/^(.{4})(.{2})(.*)/; $month -= 1; $Time= timelocal($SS, $MM, $HH, $day, $month, $year); # Build a hash of MTypeHandles with their unique ports $Hash{$mth}{$port}++; $attrID =~ s/\s*//g; $port =~ s/\s*//g; # Remove unnecessary condition # # if ( $port =~ /\-/ ) { # $port =~ s/\-//; # } $port =~ s/\-//; #remove unnecessary variable # #$key = $attrID; #$val = $DS_Info{$key}; # $val = $DS_Info{$attrID}; #Remove unnecessary join. #$val = join ( "", $val, $port ); # $val .= $port; # Remove unnecessary variable # @entry = ( $Time, $mth, $val, $value ); # # This is a big array you are making # What do you do with it? # Is there any other choice # # Populate the data array # push @Data, [@entry]; # push @Data, [ $Time, $mth, $val, $value ]; } }
      Thank you rir,

      Your suggestions and a combination of the others have improved performance greatly.

      Thanks again rir, and others who who posted suggestions

Re: Performance revision needed
by Thelonius (Priest) on Jan 02, 2003 at 18:45 UTC
    Many of these suggestions are good, but they are all micro-optimizations. I suspect that your problem is that you are using a lot of memory. Try commenting out the lines that save the data, specifically,
    $Hash{$mth}{$port}++; # and especially: push @Data, [@entry];
    Then run it and see how fast it is. If it is fast, then you should look into getting rid of @Data and maybe $Hash. It may be that $Hash isn't that large--it's hard to tell from your code. (Also, don't you want to edit $port before you increment $Hash{$mth}{$port}?)

    The easiest thing to do is probably write ($Time, $mth, $val, $value) out to a new file and process that in a second pass just as you now process @Data. Or, possibly, you can do it in one pass and not save the data at all.

Re: Performance revision needed
by Nitrox (Chaplain) on Jan 02, 2003 at 16:19 UTC
    Update:
    Thanks to IlyaM for pointing out my mis-choice of modules and explaining how Devel::SmallProf is a better choice than Devel::DProf.

    -Nitrox

Re: Performance revision needed
by poj (Abbot) on Jan 02, 2003 at 19:33 UTC
    Taking a guess at what your records looked like
    20020202;10:10:10;12:12:12;12;234;attid ;1000
    I created a 500k line file and ran the code suggested by rir on my winXP 1.1GHz 256Mb laptop and it took 57 seconds. I realise that because all the records are the same this is not a very scientific test. Without the timelocal call it was 26 seconds.
    However, 19 minutes does seems a long time. Maybe memory or disk IO is having an significant effect.
    poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://223783]
Approved by davis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (11)
As of 2024-04-19 16:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found