xdbd063 has asked for the wisdom of the Perl Monks concerning the following question:

With the help of the Perl Monk expertise, I've been able to finish a project I started back in February. In a nutshell, it changes the perms on a series of several hundred files in different directories, does a pattern search for two lines separated by text that is discarded, puts the two lines in a hash, then searches the files, one at a time in a loop, for a match to the Key. When a match is found, the Value is linked, using s///.

I am fairly confident that it works; however, everytime I try to test, it runs for several hours, then I get an error messge "Exceeded CPU Limit". I get no error messages from the Perl code, the CPU error is from the system.

I know I have asked alot, but if someone could look at what I've done and see if there is a more efficient way to do any part of the project, I would appreciate it. It takes hours just to change the permissions on the files, and I don't know why.

use strict; use warnings; use diagnostics; use File::Copy; use FileHandle; use IO::File; #--------------------------------------------------------------------- +- # Variables, Arrays and Hash declarations: my ($manualdir_param) = @ARGV; my $working_dir = $manualdir_param; $working_dir =~ s/manualdir=//i; my $data_area = "/tmp"; my $html_dir = "$data_area/$working_dir"; my @FigureArray; my @endingArray; my $FigFile; my @FileArray; my @endArray; my $DirFile; my %images; my $line; my $img; my $fig; #--------------------------------------------------------------------- +- # Loop to locate HTML files, change permissions, and make working temp +orary copies opendir( HTMLSTORIES, "$html_dir") || die "HTML dirs do not exist: $1" +; @FigureArray = grep{/^(09)(\w{1,5})(00)$/} readdir ( HTMLSTORI +ES ); foreach $FigFile (@FigureArray) { opendir( HTMSTORY, "$html_dir/$FigFile" ) || d +ie "Files do not exist: $1"; @FileArray = grep{/a.htm$/} readdir ( HTMSTORY + ); foreach $DirFile (@FileArray) { # copy ("$html_dir/$FigFile/$DirFile", " +$html_dir/$FigFile/$DirFile.bak") or die "Can not make backup copy of + file: $1"; chmod 0600, "$html_dir/$FigFile/$DirFi +le"; foreach $DirFile (@FileArray) { open (PATTERN, "<", "$html_dir/$FigFile/$DirFile") or die "Unable to +open pattern file: $1"; while (<PATTERN>) { chomp; if ($line = /(\<IMG\sSRC\=.*?\>)/.../(Figure\s +*\d+)/) { $img = $1; $fig = $3; } if ($img && $fig) # if we find both IMG +AND Figure lines, add them to images hash. { $images{$fig} = $img; } foreach (sort keys %images) { # print "$_: $images{$_}\n"; } # find matching Figure lines in text file and do linking. while (( $fig, $img) = each(%images)) { grep{$fig} "$html_dir/$FigFile/$DirFil +e"; s/$fig/"<a href = $img>$fig<\/ +a>"/g; %images = (); } } close (PATTERN); closedir HTMSTORY; closedir HTMLSTORIES; # Loop to do cleanup opendir( ENDSTORIES, "$html_dir") || die "HTML dirs do not exist: $1"; @endingArray = grep{/^(09)(\w{1,5})(00)$/} readdir ( ENDSTORIE +S ); foreach $FigFile (@endingArray) { opendir( ENDSTORY, "$html_dir/$FigFile" ) || die "Files do not + exist: $1"; @endArray = grep{/a.htm.bak$/} readdir ( ENDSTORY ); foreach $DirFile (@endArray) { # unlink ("$html_dir/$FigFile/$DirFile +.bak") or die "Can not remove backup copy of file: $1"; chmod 0644, "$html_dir/$FigFile/$DirFi +le"; } } close ENDSTORY; close PATTERN; closedir HTMSTORY; closedir HTMLSTORIES; } } }

READMORE tags added by Arunbear

Replies are listed 'Best First'.
Re: Exceeding CPU Limit
by jasonk (Parson) on Mar 24, 2006 at 15:35 UTC

    If you cleaned up your indenting a little to make this more readable, you would find some interesting things going on that you probably don't expect. For example, this chunk of code:

    while (( $fig, $img) = each(%images)) { grep{$fig} "$html_dir/$FigFile/$DirFil +e"; s/$fig/"<a href = $img>$fig<\/ +a>"/g; %images = (); }
    is getting run once for every line in every file you analyze. I suspect you only wanted to run it once per file, rather than once per line, which is probably causing it to take a great deal more time than you would have expected.


    We're not surrounded, we're in a target-rich environment!
Re: Exceeding CPU Limit
by lima1 (Curate) on Mar 24, 2006 at 15:35 UTC
    it is really hard to read your code. way to many global variables, duplicated code, bad formating and it even does not compile here.

    Can you refactor your code a little bit? Maybe this function helps you.

    use strict; use warnings; use English qw( -no_match_vars ) ; use Data::Dumper; sub get_files_in_directory { my ( $dir, $filter ) = @_; opendir my $DIR, $dir or die "Can't open $dir: $OS_ERROR"; my @files = grep {/$filter/} readdir($DIR); closedir $DIR; return @files; } # print every file in tmp starting with 'N' print Dumper( get_files_in_directory( '/tmp', qr/^N/ ) );
    perltidy is a great tool if you are too lazy to format your code nicely.
      or can also use File::Find::Rule or similar:
      my @files = File::Find::Rule->file()->name( qr/^N/ )->maxdepth(1)->in( + '/tmp/' );
      Note that that will return full paths ..
Re: Exceeding CPU Limit
by davido (Cardinal) on Mar 24, 2006 at 17:09 UTC

    If I'm not mistaken, CPU quotas are created by system administrators so that no one of their user accounts can be used to consume more than a certain number of hours of CPU time. Usually the limits are set high enough that an interactive session will never bump into the limit, but low enough that a rogue script will be terminated before it consumes so much CPU time that all other users suffer.

    It appears that the script you're running is consuming more CPU time than your system administrator deems prudent for the smooth running of the system, and your process is being sent a terminate signal explaining that you've exceeded your CPU limit (your quota).

    Your first course of action is to re-examine your algorithms to see if there is any way you can reduce nested loops and other time-expensive code.


    Dave

Re: Exceeding CPU Limit
by bowei_99 (Friar) on Mar 24, 2006 at 23:15 UTC
    Adding subroutines like the one limal suggests would help isolate the problem. You could test each subroutine independently on a relatively small subset of files, so if it's slow, you can at least isolate it to a certain chunk of code. Right now, you have too much to look at at once. I look to keep my subroutines to a screenful (about 35 lines for me).

    Avoiding global variables, i.e. declaring a variable within a subroutine with my, also helps keep the problem isolated. You only pass in and out of the subroutine what you have to. For instance, you could move your while statement to a subroutine, like

    while (<PATTERN>){ # do stuff here }
    You could pass in the filehandle <PATTERN> saved as a variable, and make it more readable. To do that, you would replace your current statement:

    open (PATTERN, "<", "$html_dir/$FigFile/$DirFile") or die "Unable to +open pattern file: $1";
    with:
    open my $PATTERN, "<", "$html_dir/$FigFile/$DirFile" or die "Unable to open pattern file: $!";
    and later, pass the open filehandle into a subroutine, like:
    DoSomethingWithFile($PATTERN);
    where
    sub DoSomethingWithFile { my $FH = shift @_; <code> while (<PATTERN>){ # do stuff here } }
    (You might notice that in my open statement above, I replaced $1 with $!, which is I think what you intended. $1 is the variable used to store the first match in a regular expression, and the value returned if a system or library call fails. As it says in perlvar, a mnemonic is 'What just went bang?'.

    One thing that would really help with the formatting is if the beginning and ending of each curly brace were lined up. For instance, the fact that you have at the end of your code two curly braces in the same horizontal position makes it harder to read and figure out the flow of the program.

    -- Burvil