It looks like you'll need to parse the log file into log events first, before dealing with the XML content; e.g. something like:
my @logentries = (); my $toss = 1; my $monthRegex = join("|",qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct N +ov Dec/); while (<LOGDATA>) { if(/^($monthRegex)\s+d{1,2}\s+(\d{2}:){3}/) { push(@logentries,$_); $toss = 0; # unless you want to ignore some of'em # in which case: $toss++; } else { $logentries[$#logentries] .= $_ unless $toss; } } foreach (@logentries) { # decide what to do with each entry, # handling XML content with a suitable module when necessary }
No doubt someone knows of a module for recognizing dates in log files, but I think it's a simple-enough issue that coding this part from scratch is just as easy.

If the log is really big and you don't want an array eating up that much memory, this alternative while loop would work:

my $entry = ""; while (<LOGDATA>) { if(/^($monthRegex)\s+d{1,2}\s+(\d{2}:){3}/) { $result = &handleEntry( $entry ) if $entry; $entry = $_; # unless you want to ignore some of'em # in which case: $entry = ""; } else { $entry .= $_ if $entry; } } $result = &handleEntry( $entry ) if $entry; # and replace the "foreach" loop in the previous version # with "sub handleEntry { ... }"

update: fixed some commentary about setting $entry = "" inside the while loop
update: added the $toss and fixed commentary in the first example, and fixed the second example (again!) so that extra lines from a "tossed" entry get ignored properly.


In reply to Re: Why is this program producing blank lines when parsing a file by graff
in thread Why is this program producing blank lines when parsing a file by vonman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.