Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

XML log files

by dingus (Friar)
on Dec 05, 2002 at 15:50 UTC ( [id://217788] : perlquestion . print w/replies, xml ) Need Help??

dingus has asked for the wisdom of the Perl Monks concerning the following question:

For reasons similar to "it seems like a good idea" and "why not", I'd like to change the format of a logfile I write to being XML in the next format. But there's a minor problemette.

Normally when I update a logfile I do

open (LOGFILE,">>$logfn"); print LOGFILE, $eventinfo; close (LOGFILE);
which just appends whatever was in $eventinfo to the end of the file. I'd like to do the same thing with the new XML based one which would mean the logfile would look like:
<event time='1234' type='this'> <detail>blah</detail><detail>blahblah</detail> </event> <event time='1236' type='this'> <detail>blah</detail><detail>blahblah</detail> </event> <event time='2234' type='that'> <detail>weeble</detail><detail>blahblah</detail> </event>
Unfortunately if I try to use XML::Simple (or any other XML reader come to think of it) to process the logfile as is it barfs because there are no <rootelement></rootelement> tags surrounding the event list.

Clearly the starting <rootelement> tag is easy to insert in the logfile. My problem is the final </rootelement> tag because what I need to do is overwrite that with $eventinfo.'<rootelement>' and when you open using append you can't do this.

It seems to me an alternative would be to provide the XML::Simple reader with fake <rootelement> tags, but there doesn't seem to be a way to do this without reading the whole file in once anyway which could be messy if it gets large.

Anyone got any suggestions other than using open(LOGFILE, "+<$logfn") and some horrible "seek"ing?


Enter any 47-digit prime number to continue.

Replies are listed 'Best First'.
Re: XML log files
by mirod (Canon) on Dec 05, 2002 at 16:13 UTC

    Entities to the rescue!

    You can just create a wrapper that will include just the root element and a call to an entity referencing the log file, which itself has no root tag:

    log.xml is:

    <?xml version="1.0"?> <!DOCTYPE log [ <!ENTITY data SYSTEM ""> ]> <log>&data;</log> is:

    <event time='1234' type='this'> <detail>blah</detail><detail>blahblah</detail> </event> <event time='1236' type='this'> <detail>blah</detail><detail>blahblah</detail> </event> <event time='2234' type='that'> <detail>weeble</detail><detail>blahblah</detail> </event>

    XML processors should have no problem with this (tested with perl -MXML::Simple -MData::Denter -e'print Denter XMLin( "log.xml");'). You just output your log data to and use log.xml when you want to do XML processing on it.

      This is a very neat trick, I like it a lot.

      There might be one catch though: according to the XML specs a oon-validating parser may, but doesn't have to include the external entity (ie. the file the URI is refering to).

      If I interprete the specs correctly, this means that this feature is implementation dependent.

      Just my 2 cents, -gjb-

        Indeed XML::Parser will do it (and thus all modules based on it, such as XML::Simple, XML::Twig, XML::DOM, XML::XPath...), I suspect XML::LibXML will do it, along with modules based on it (you can base most of the SAX modules on it) but I don't think XML::SAX::PurePerl will.

      Entities to the rescue!

      I knew there would be a nice XML way to do this. Merci Beaucoup, Muchos Gracias, Vielen Dank, Spacebo, Arigatou, Kiitos, Tusen Takk, Obrigado and Sanctuary match!

      In fact for temporary hacking use at least it is possible to omit log.xml - the following code works:

      $logfn = '/path/to/'; print Dumper (XMLin(<<EOENT )); <?xml version="1.0"?> <!DOCTYPE log [ <!ENTITY data SYSTEM "$logfn"> ]> <log>&data;</log> EOENT


      Enter any 47-digit prime number to continue.
        That is "Tusen Tack", not "Takk".


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.

        You actually forgot xičxie, m'goi sai, khop-khun krap, terima kasih, salamat and Danggschee? What an ungrateful person... ;-)

      That is a cool trick. In real life of course XML::Simple (or any of the DOM modules) would probably be unsuitable for processing the log files since they all create a tree representing the whole contents of the file in memory. XML::Twig would be a fine choice or alternatively a SAX approach would work too.

      On the other hand, as merlyn pointed out, YAML might be a better fit. While the extra 'fluff' of XML can compress well, you will have to uncompress the whole file to process it (ie: without root elements, you couldn't parse the XML from a unzip stream).

•Re: XML log files
by merlyn (Sage) on Dec 05, 2002 at 16:02 UTC
    Two suggestions:
    • Use a data format that can be extended, like YAML.
    • If you insist on using XML, then parse it by pushing a fake root around of the file:
      use XML::Simple; my $content = do { local (*ARGV, $/); @ARGV = "mylogfile"; <> }; my $parsed_logfile = XMLin("<fake>$content</fake>", \%other_options);

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      • Use a data format that can be extended, like YAML.
      Wow I never knew about that one - thats probably the solution. As I see it, the advantage of an XML solution is that the structure is well understood which means its easier for the logfile to be self documenting and thus is more maintainable in the event of my falling under a bus.
      • If you insist on using XML, then parse it by pushing a fake root around of the file...
      That was something I wanted to avoid. Its OK for a small logfile that can be hndled with XML::Simple, but if(when?) the log file grows then I'd like to migrate to XML::Twig or similar and then I can't do that.


      Enter any 47-digit prime number to continue.
Re: XML log files
by pg (Canon) on Dec 06, 2002 at 04:48 UTC
    Wait a minute, I know that +< and seek can make things quite complex, and too many seeks may lower the performance, but we have to look at this case by case. If we can write a program by using seek and +<, yet it is very neat and no performance issue, then why not?
    use Fcntl qw(SEEK_CUR SEEK_END); #import constant use strict; use warnings; use constant CLOSING_TAG => "</events>"; #deinfe the closing tag as co +nstant my $event1 = "<event time=\"time1\" type=\"type1\"><detail>detail1</de +tail></event>"; open(LOGFILE, "+<", "log.xml"); seek(LOGFILE, -length(CLOSING_TAG), SEEK_END); my $cur_char; #you may have newlines, blanks etc after the closing tag #that's why we do the following to make the program robust do { sysread(LOGFILE, $cur_char, 1); seek(LOGFILE, -2, SEEK_CUR) if ($cur_char ne "<"); } until ($cur_char eq "<"); seek(LOGFILE, -1, SEEK_CUR); print LOGFILE $event1 . "\n"; #you can do as many prints as you want before close #no more seek, we did it once and forever print LOGFILE CLOSING_TAG; close(LOGFILE);