in reply to Re: Efficient deletion of files / shell interaction
in thread Efficient deletion of files / shell interaction

Thanks, unlink works fine. I'll benchmark both approaches for my own edification later. However I'm not sure how to make the best use of seek and readline given that the log files are of varying lengths (~100kB - 10MB) and I'm always after just the last line. It seems a bit too much work to open, read and close several hundred files.

  • Comment on Re^2: Efficient deletion of files / shell interaction

Replies are listed 'Best First'.
Re^3: Efficient deletion of files / shell interaction
by Your Mother (Archbishop) on Jul 19, 2009 at 17:17 UTC

    In addition to Corion's advice there is File::ReadBackwards. I don't know if it would be faster than seek but it would probably be easier to code up since it's line instead of block-size oriented.

Re^3: Efficient deletion of files / shell interaction
by Corion (Patriarch) on Jul 19, 2009 at 16:49 UTC

    seek allows you to seek to (just before) the end of the file and to read the last (say) 1024 bytes of the file, and then look there for what you're searching for. Which is about what tail does, too.

Re^3: Efficient deletion of files / shell interaction
by graff (Chancellor) on Jul 19, 2009 at 20:30 UTC
    The trick about using seek and read is that in order to seek to a position from the end of a file, you have to specify a negative number for the offset amount. For example, if the log files for successful runs always have the phrase "Normal termination\n" as the very last thing in each file, that's just 19 bytes you need to read from the end -- but let's pad that a bit, just to be safe:
    #!/usr/bin/perl use strict; my $prefix="Kick"; my $Restart="Restart.data"; open( RESTART, $Restart ) or die "Unable to read $Restart: $!\n"; my $AlreadyDone = <RESTART>; my ( $jobs_run ) = ( $AlreadyDone =~ /(\d+)/ ); for ( my $j=0; $j<=$jobs_run; $j++) { my $job_title = sprintf( "%s%04d", $prefix, $j ); if ( open( my $fh, "<", "$job_title.log" )) { seek( $fh, -24, 2 ); read( $fh, my $job_end, 24 ); unlink "$job_title.data" unless ($job_end =~ /Normal/); } else { warn "Unable to read $job_title.log: $!\n"; } }

    Some miscellaneous notes:

    • I didn't see any clear rationale for using the "-s" option on the shebang line. If you have a reason for that in your "production" script, it's fine, but it seemed unnecessary here.
    • Your method of getting a numeric value from the "Restart.data" file was strange. I think a regex match for the numeric is better/safer.
    • When you decide to report an error message regarding a failed open() call, including "$!" in the message can be very helpful.
    • Overall, I think the overhead of opening and closing data files in perl will be less (given that you are only reading the last couple dozen bytes from each file), than the OS overhead of creating and tearing down 100's of subshells for running "tail", just as the use of perl's unlink function is fairly certain to be more efficient than a series of subshells that invoke the unix "rm" command.
    • Notice how easy it was to add use strict

    (updated to add a missing close-paren -- forgot to do "perl -cw" before hitting "create")