in reply to Interlaced log parser

Thanks for the replies, guys. Unfortunately, the timestamp of a line can change during a single transaction while the log is being generated, so I don't think I can rely on that + thread ID as a unique identifier. Another problem is that a transaction may start at the end of one log file and carry over onto the next log file.

Replies are listed 'Best First'.
Re^2: Interlaced log parser
by SuicideJunkie (Vicar) on Sep 04, 2009 at 15:17 UTC

    If you can identify the end of a transaction, and each thread only works on one transaction at a time, you won't have a problem.

    Use the thread ID as the hash key, and delete that entry once the transaction is complete and you have done whatever you need to do with the data. That way, you have a nice clean place to start building up info about the thread's next transaction.

    The key things:

    • Whatever scheme you use to identify a transaction when doing multiple passes of the file, use that as your top level hash key.
    • Whatever local variables you used to use to store info about the transaction, use those names as your second level hash keys
    • Just keep reading log files in order and don't worry; the inner code should not care which file the lines are coming from, just that it has a stream of lines to analyse.

    foreach $file (@files) { open my $FH, '<', $file or die "$file open says $!"; while (<$FH>) { ponder($_); $transactionHash{$threadID}{oldVariableName} = $stuff; delete $transactionHash{$threadID} if $transactionEnded; } }