The biggest problem in your parsing is recognizing the end of one run, because in your specification, a line with stars may either be the end of the current session, or perhaps the beginning of the next one because the current session was not ended properly. So any algorithm that doesn't reinterpret the meaning of a star-line in the context of the following line is doomed to fail.

This smells like a perfect job for Parse::RecDescent. The grammar will look something like (warning: UNTESTED):

file: report(s?) /\Z/ { return $item[1] } report: complete_report | incomplete_report complete_report: star_line server_started data_line(s?) server_closed +star_line { return ["complete:", @item[2,3,4]] } incomplete_report: star_line server_started data_line(s?) server_clos +ed(?) { return ["incomplete:", @item[2,3]] } star_line: "*****\n" server_started: "Server Started" /.*\n/ { "@item[1, 2]" } server_closed: "Server Closed" /.*\n/ { "@[item[1, 2]" } data_line: ...!(star_line | server_started | server_closed) /.*\n/
The result will be an array ref like:
[ ["complete:", "Server Started Monday", ["data1", "data2", "data3"], +"Server Closed Thursday"], ["complete:", "Server Started Tuesday", ["data1", "data2", "data3"], + "Server Closed Thursday"], ["complete:", "Server Started Wednesday", ["data1", "data2", "data3" +], "Server Closed Thursday"], ["incomplete:", "Server Started Monday", ["data1", "data2", "data3"] +], ["complete:", "Server Started Monday", ["data1", "data2", "data3"], +"Server Closed Thursday"], ]
Hopefully, you can read up enough on Parse::RecDescent to figure out how to use this grammar and invoke it. If I get time, I'll write this up completely and repost it. Actually, it looks like a nice potential future Linux Magazine article. Thanks for the idea! </code>

-- Randal L. Schwartz, Perl hacker


In reply to Re: Multiple session log extraction from a single file problem by merlyn
in thread Multiple session log extraction from a single file problem by dmtelf

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.