in reply to Fast Recall
Is this a long-running process or something run on a scheduler?
If it's a long-running process and the stores fit in memory, I'd have a failure hash keyed on job number. I'd check for existence in that hash whenever I got a success to see if I needed to clear it. Every 100 lines or so I'd check to see which failed jobs need to be cleared. You don't really say which parts of this need to be output, but I'd guess the most important would be which job IDs are being cleared after eight hours of repeated failure.
If your process gets run repeatedly by a scheduler like cron, then you need to store your data keyed on job ID outside your program. This can be done using SQL (connected to a remote SQL RDBMS or using something like DBD::SQLite or DBD::CSV if you really can't have a database on the box itself). It can be done using an XML store. It can be done using Storable. It could be done using a directory full of files, one for each failed job. It could be done using gdbm, ndbm, or Berkeley db.
For a long-running process, I'd probably approach it in a way similar to this and see how it scales (untested):
my @hours; while ( <$file> ) { my $hour = ( localtime )[ 2 ]; if ( /(job-id-format) failed/ ) { if ( ! exists $failed{ $1 } ) { push @{ $hours[ $hour ] }, $1; failed{ $1 } = 1; } report_temporary_job_failure( $1 ); } elsif ( /(job-id-format) succeeded/ ) { delete $failed{ $1 } if exists $failed{ $1 }; report_previously_failed_job_success( $1 ); } if ( 0 == $. % 100 ) { my $purge = ( $hour - 8 >= 0 ) ? $hour - 8 : $hour + 24 - 8; while ( $job = shift @{ $hours[ $purge ] } ) { report_final_job_failure( $job ); delete $failed{ $job }; } } }
I'm pretty sure you can get six lines a second or far more from that. I've put PC-class machines through data gathering tasks on logs that grow multiple orders of magnitude faster than that.
|
|---|