Re: system "gzip $full_name" is taking more time

Someone has already mentioned checking that the file is not already a gzip file. I see you have a test for that, but your regular expression does not correctly test if the file name ends in '.gz', only if it contains '.gz'.

I suggest:

next if /\.gz$/;
[download]

Your test of the file age might be more readable if you use '-M' (see perldoc perlfunc).

As others suggest do some timing measurements. The fact that the directory has 90,000 files might slow down directory related operations, especially if it is network mounted. unlink and gzip could be affected by this as they perform directory updates.

If you comment out the gzip portion, how long does the unlinking take on these large directories.

Measure how long gzip takes with a typical file. Also, when running your script check the processes. Is there a single long running gzip? You should be able to come up with some rough order of magnitude figures (I assume you have examined typical directories and have an idea of the number and sizes of files).

Also consider whether these directories a overdue for cleanup. If so, the first run of your script may take a lot longer that future runs.

Comment on Re: system "gzip $full_name" is taking more time Download Code

Replies are listed 'Best First'.
Re^2: system "gzip $full_name" is taking more time by bulk88 (Priest) on Dec 08, 2013 at 04:11 UTC
To continue on wazat's post, use substr and eq instead of regexs for such ridiculously simple patterns. It will be faster. `my $file_time = (stat($full_name))[9]; my $diff = $now - $file_time; $diff = $diff / 86400; my $read = localtime($file_time);` [download] combine statements 1, 2, and 3. Something like `my $file_time; my $diff = ($now - ($file_time = (stat($full_name))[9])) / 86400; my $read = localtime($file_time);` [download] less assignments/reads and less pp_nextstate ops. `$diff = $diff / 86400; my $read = localtime($file_time); if ( $diff > 93 ) { print FILE "$full_name : $diff : $read\n"; unlink "$full_name"; } elsif ( $diff > 3 ) { next if (/\.gz/);` [download] Optimize out the division by multiplying 93 and 3 by 86400, and comparing to the larger numbers than doing the division. And substr/eq instead of the .gz regex. `unlink "$full_name";` [download] That looks bizzare like someone who has never done Perl before, don't do that. `my $read = localtime($file_time);` [download] Don't do that, don't print the converted time to console, just the unix time. If someone really wants read the log they can do the conversion themselves. IDK if you can do it with stat() or not, but get the -f and -d and stat on $full_name into exactly ONE syscall, save results to lexicals, then process the results. Don't do redundant I/O calls. I would guess if you used Nytprof (you should have used that BEFORE coming to perlmonks), your script is either I/O bound to disk/filing system or CPU bound in gzip compression algoritm.	[reply] [d/l] [select]
Re^3: system "gzip $full_name" is taking more time by parv (Parson) on Dec 09, 2013 at 08:20 UTC
Others may make arguments for benchmarks regarding various points above. ... don't print the converted time to console, just the unix time. If someone really wants read the log they can do the conversion themselves. -- bulk88 Not converting the time would be good enough if there won't be further use of the log. Otherwise, that places a high cost on the person reading to do the time conversions to be able to find the relevant entries. And, have a damn functioning log.	[reply]