Re^2: Efficient deletion of files / shell interaction

Replies are listed 'Best First'.
Re^3: Efficient deletion of files / shell interaction by Your Mother (Archbishop) on Jul 19, 2009 at 17:17 UTC
In addition to Corion's advice there is File::ReadBackwards. I don't know if it would be faster than seek but it would probably be easier to code up since it's line instead of block-size oriented.	[reply]
Re^3: Efficient deletion of files / shell interaction by Corion (Patriarch) on Jul 19, 2009 at 16:49 UTC
seek allows you to seek to (just before) the end of the file and to read the last (say) 1024 bytes of the file, and then look there for what you're searching for. Which is about what `tail` does, too.	[reply] [d/l]
Re^3: Efficient deletion of files / shell interaction by graff (Chancellor) on Jul 19, 2009 at 20:30 UTC
The trick about using seek and read is that in order to seek to a position from the end of a file, you have to specify a negative number for the offset amount. For example, if the log files for successful runs always have the phrase "Normal termination\n" as the very last thing in each file, that's just 19 bytes you need to read from the end -- but let's pad that a bit, just to be safe: #!/usr/bin/perl use strict; my $prefix="Kick"; my $Restart="Restart.data"; open( RESTART, $Restart ) or die "Unable to read $Restart: $!\n"; my $AlreadyDone = <RESTART>; my ( $jobs_run ) = ( $AlreadyDone =~ /(\d+)/ ); for ( my $j=0; $j<=$jobs_run; $j++) { my $job_title = sprintf( "%s%04d", $prefix, $j ); if ( open( my $fh, "<", "$job_title.log" )) { seek( $fh, -24, 2 ); read( $fh, my $job_end, 24 ); unlink "$job_title.data" unless ($job_end =~ /Normal/); } else { warn "Unable to read $job_title.log: $!\n"; } } [download] Some miscellaneous notes: I didn't see any clear rationale for using the "-s" option on the shebang line. If you have a reason for that in your "production" script, it's fine, but it seemed unnecessary here. Your method of getting a numeric value from the "Restart.data" file was strange. I think a regex match for the numeric is better/safer. When you decide to report an error message regarding a failed `open()` call, including "$!" in the message can be very helpful. Overall, I think the overhead of opening and closing data files in perl will be less (given that you are only reading the last couple dozen bytes from each file), than the OS overhead of creating and tearing down 100's of subshells for running "tail", just as the use of perl's unlink function is fairly certain to be more efficient than a series of subshells that invoke the unix "rm" command. Notice how easy it was to add `use strict` (updated to add a missing close-paren -- forgot to do "perl -cw" before hitting "create")	[reply] [d/l] [select]