tzen:

I've done similar things, and I find it best to make a single pass through the file. To do so, here's the approach I take:

First, I use a hash to contain all current transactions. (I'm assuming that each thread is handling only one task at a time, so within a thread you're not getting multiple transactions intermingled.) In this case, I'd use the thread ID as the key.

Next, read each line. You're going to find that the line is one of:

  1. The start of a new transaction. Just emit the current transaction for the thread (if any) and start collecting the data for the new transaction.
  2. The end of a transaction (some transactions will have recognizeable ends). Emit the transaction and delete the data from the hash.
  3. Additional data for the transaction. Add the appropriate information to the hash.
  4. An explicitly ignored line (comments, blank lines, information you're not collecting, etc.)
  5. An unrecognized line, in which case you would print a warning or similar action if you care.
  6. An unexpected line, i.e., you recognize it but it's unexpected at this time. (Such as a transaction end before you get a transaction start for the thread.)

A bit of code to illustrate:

my %TxnQ = (); while (<DATA>) { ##### TRANSACTION HEADERS ##### # HEADER: emit previous transaction (if any), start new one if ( m/(.{10}\s.{12})\s\((\d+)\)Authentication Request/ ) { # Emit previous transaction, if any complete_transaction($TxnQ{$2}) if exists $TxnQ{$2}; # Delete previous data by replacing with new data $TxnQ{$2} = (timestamp=>$1, type='Request'); } # ...etc... ##### INTERMEDIATE LINES ##### elsif ( m/.{10}\s.{12}\s\((\d+)\)Acct-Session-Id : String Value = +(.*$)/ ) { # Just add the additional data to the threads transaction reco +rd $TxnQ{$1}{'Acct-Session-Id'} = $2; } # ...etc... ##### TRANSACTION TERMINATORS ##### elsif ( m/.{10}\s.{12}\s\((\d+)\)User-Name : String Value = (.*$)/ + ) { # Add the final data item(s) (if req'd) $TxnQ{$1}{'User-Name'} = $2; # Process the transaction complete_transaction($TxnQ{$1}); # Delete the data delete $TxnQ{$1}; } # ...etc... ##### LINES WE DON'T CARE ABOUT ##### elsif ( m/frammistat/ | m/^\s*$/ | m/^#/ ) { # DO NOTHING We're explicitly ignoring these lines } else { print "LINE $.: Unrecognized line. Complete text:\n$_"; } } # Complete remaining transactions (hopefully complete transactions # that don't have explicit transaction terminator lines) for (keys %TxnQ) { complete_transaction($TxnQ{$_}); } sub complete_transaction { my $hr = shift; if (!defined $$hr{type}) { print "Incomplete transaction found!\n"; } elsif ($$hr{type} eq 'Request') { complete_request($hr) } elsif ($$hr{type} eq 'Response') { complete_response($hr) } # ...etc... else { print "ERROR: Unexpected transaction type: $$hr{type}!\n"; } }

Obviously, you'd need to add error handling and such as you see fit. Standard disclaimers apply: Untested code, use at your own risk, if it breaks you can keep all the pieces, etc. ad nauseum.

...roboticus

Update: And if I had read the entire thread, I would've noticed that ig had already given an example of how to do this. Ah, well, it happens when you don't get enough sleep. I also added the <readmore> tags, as the post was a bit longish.


In reply to Re: Interlaced log parser by roboticus
in thread Interlaced log parser by tzen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.