vonman has asked for the wisdom of the Perl Monks concerning the following question:

I need to write a quick program to parse through a logfile that has both log entries with dates and XML imbedded in the file. The following snippet is an example of the type of data we see in the file.
Feb 30 10:55:23: [B:S:0:12345abcd:Information] Information Feb 30 10:55:23: Before converting xml into DOM in the XMLRespToEvent +method Feb 30 10:55:23: The xmlString before parsing is : <?xml version="1.0" + encoding="ISO-8859-1" ?> <outputGetSubscriptionInfo ixc=""> <svcAsgmInfo svcEffDt="2002-04-29" svcExprDt="9999-12-31" salesChnlI +d="CC"> <primarySvc svcNm="F W-CTP" svcId="abcdefghij:" svcTyp="PK" svcDes +c="Basic service &amp;MWI" srchCategory="PK" extnClassNm="COM_SVC"> <depositAmt/> <charge chgTypeCd="R" chgStDt="2001-08-01"> <chgAmt amntDue="0.00"/> </charge>
I have written the following snippet to read the file:
while (<LOGDATA>) { $logline=$_; print ("The whole line is ---> ",$logline); }
Whenever the program hits a line with the XML it just prints a blank line. Any ideas? Also, after writing a general parsing engine I will need to write another program to parse through the XML. Best recommendations? Thanks in advance!!!!! Rich

Replies are listed 'Best First'.
Re: Why is this program producing blank lines when parsing a file
by cLive ;-) (Prior) on May 20, 2002 at 21:43 UTC
    I'm gonna make a huge assumption here - you're outputing to a browser - if so, amend to:
    while (<LOGDATA>) { $logline=$_; $logline =~ s/</&lt;/g; $logline =~ s/>/&gt;/g; print ("The whole line is ---> ",$logline); }

    As for recommendations, why not search CPAN for XML parsing modules, eh?

    .02

    cLive ;-)

    --
    seek(JOB,$$LA,0);

      You're my hero!!!!. Yup I am outputting directly to html. Couldnt understand why this simple prog wouldnt work. Now it makes complete sense. Thanks again. PS. If anyone has any experience and recommendations with respect to the available CPAN modules please let me know.
Re: Why is this program producing blank lines when parsing a file
by graff (Chancellor) on May 20, 2002 at 22:50 UTC
    It looks like you'll need to parse the log file into log events first, before dealing with the XML content; e.g. something like:
    my @logentries = (); my $toss = 1; my $monthRegex = join("|",qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct N +ov Dec/); while (<LOGDATA>) { if(/^($monthRegex)\s+d{1,2}\s+(\d{2}:){3}/) { push(@logentries,$_); $toss = 0; # unless you want to ignore some of'em # in which case: $toss++; } else { $logentries[$#logentries] .= $_ unless $toss; } } foreach (@logentries) { # decide what to do with each entry, # handling XML content with a suitable module when necessary }
    No doubt someone knows of a module for recognizing dates in log files, but I think it's a simple-enough issue that coding this part from scratch is just as easy.

    If the log is really big and you don't want an array eating up that much memory, this alternative while loop would work:

    my $entry = ""; while (<LOGDATA>) { if(/^($monthRegex)\s+d{1,2}\s+(\d{2}:){3}/) { $result = &handleEntry( $entry ) if $entry; $entry = $_; # unless you want to ignore some of'em # in which case: $entry = ""; } else { $entry .= $_ if $entry; } } $result = &handleEntry( $entry ) if $entry; # and replace the "foreach" loop in the previous version # with "sub handleEntry { ... }"

    update: fixed some commentary about setting $entry = "" inside the while loop
    update: added the $toss and fixed commentary in the first example, and fixed the second example (again!) so that extra lines from a "tossed" entry get ignored properly.