comment on

Is this a long-running process or something run on a scheduler?

If it's a long-running process and the stores fit in memory, I'd have a failure hash keyed on job number. I'd check for existence in that hash whenever I got a success to see if I needed to clear it. Every 100 lines or so I'd check to see which failed jobs need to be cleared. You don't really say which parts of this need to be output, but I'd guess the most important would be which job IDs are being cleared after eight hours of repeated failure.

If your process gets run repeatedly by a scheduler like cron, then you need to store your data keyed on job ID outside your program. This can be done using SQL (connected to a remote SQL RDBMS or using something like DBD::SQLite or DBD::CSV if you really can't have a database on the box itself). It can be done using an XML store. It can be done using Storable. It could be done using a directory full of files, one for each failed job. It could be done using gdbm, ndbm, or Berkeley db.

For a long-running process, I'd probably approach it in a way similar to this and see how it scales (untested):

my @hours;
while ( <$file> ) {
    my $hour = ( localtime )[ 2 ];

    if ( /(job-id-format) failed/ ) {
        if ( ! exists $failed{ $1 } ) {
            push @{ $hours[ $hour ] }, $1;
            failed{ $1 } = 1;
        }
        report_temporary_job_failure( $1 );
    } elsif ( /(job-id-format) succeeded/ ) {
        delete $failed{ $1 } if exists $failed{ $1 };
        report_previously_failed_job_success( $1 );
    }
    if ( 0 == $. % 100  ) {
        my $purge = ( $hour - 8 >= 0 ) ?
            $hour - 8 : $hour + 24 - 8;
        while ( $job = shift @{ $hours[ $purge ] } ) {
            report_final_job_failure( $job );
            delete $failed{ $job };
        }
    }
}
[download]

I'm pretty sure you can get six lines a second or far more from that. I've put PC-class machines through data gathering tasks on logs that grow multiple orders of magnitude faster than that.

In reply to Re: Fast Recall by mr_mischief
in thread Fast Recall by sans-clue

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.