ibanix has asked for the wisdom of the Perl Monks concerning the following question:

This question is written at the prodding of the CB (hi Zaxo!).

I'm tasked to develop something which will consolidate logfiles. Our environment is a diverse group of Windows 2000 (and one NT4) servers, of which only one, my application server, runs ActivePerl.

We have around two dozen IIS webservers, each with up to 8 different sites, hence up to 8 daily logfiles. In addition, we have several SMTP servers, and some other strange logfiles (ColdFusion, for one). These files do not all have a consistent naming scheme, location on server, or relationship between name/location and function. I need to take all these daily files, optionally compress them, process, rename, and move them to a single server, where they will eventually be moved to tape. My life is made somewhat easier by having the application server also be the logging server. I will be doing a pull from each server to the logging server.

Now, onto the real question. Originally, I had thought I would design a file type with one server / location / function per line, ala:
# my ugly flatfile # server1 D:\logfiles\w3svc3 application1 server1 D:\logfiles\w3svc4 application2 server2 E:\logfiles\w3svc3 application1
And then I could ($server, $location, $app) = split; it. There are some other bits of information I will probrally want to include.

But I am wondering: should I make the file be in an XML format and use XML::Parser or the like? What advantages/disadvantages does this have? I am a XML-n00b and unfamilar with it's correct applications.

Will an XML-format help me if I want to add a new data bit, like "compress" or "strip images if HTTP logile"?

Thanks be to you, Oh Hallowed Masters of all that it is Perl-fu...


<-> In general, we find that those who disparage a given operating system, language, or philosophy have never had to use it in practice. <->

Replies are listed 'Best First'.
Re: To use XML or not to use XML
by Abstraction (Friar) on Nov 26, 2002 at 02:51 UTC
    Since you are new to XML, I would suggest taking a look at this article (Perl XML Quickstart: The Perl XML Interfaces), paying special attention to the XML::Simple section.

    The first paragraph of the XML::Simple section states:

    Originally created to simplify the task of reading and writing config files in an XML format, XML::Simple translates data between XML documents and native Perl data structures with no intervening abstract interface. Elements and attributes are accessed using nested references.

    Judging by your post this looks like it would be a good fit.
Re: To use XML or not to use XML
by mirod (Canon) on Nov 26, 2002 at 08:21 UTC

    First one important point: you don't HAVE to use XML.

    Why would you use it, and is it really that much of an improvement over other formats:

    • it's a text format, meaning that you can always check what's in your config file with less and edit it with vi, but most other formats are also text, so there is little gain here,
    • there are tools for it, so you don't have to write your own parser, but there are Perl modules for a lot of other formats, see Persistence for options? for exemple,
    • it is a powerful standard, so you get a lot more than you realize when you start using it, like entities (your configuration could be split into several physical files, that do not even have to be on the same machine), stylesheets (so you can display the file in a browser easily)...
    • one of the big wins I think is that it is very easy to write code that will survive changes in the format of the config file, say you add a field, your existing code will much likely ignore it and keep on working; you can get this with other formats but it might be a little harder to do (and I will not argue with anyone that will try to prove me that CSV can match this ;--)
    • XML is quite flexible, so "multiple fields" could be properly tagged without you having to write a more complex parser:
      something like this will still be easy to parse:
      <config> <server name="server1"> <application name="app1"> <params> <!-- note the 2 params --> <param>p1</param> <param>p2</param> </params> </application> </server> </config>
    • finally it is a standard, it will make your boss feel good, and it is a safe way to get aquainted with XML, instead of having to tackle it on a bigger and more risky project.

    A couple of things that can be annoying though:

    • The content of the fields need to be proper XML: they can't include '<' or '&' for example. Most modules will deal with this transparently... provided you use them for creating the file!
    • encodings might become an issue if you use accented characters, which does not seem likely in your case, but you never know... maybe the German branch will want error messages in German some day or something like that...

    Finally, as others have suggested, XML::Simple is probably the ideal module for you. Write your config file, load it using XMLin, dump it ( or use the debugger) to figure out what's in the data structure, and you should be ok. Read the docs though, as, to quote them "in fact as each release of XML::Simple adds more options, the module's claim to the name 'Simple' becomes more tenuous".

Re: To use XML or not to use XML
by peschkaj (Pilgrim) on Nov 26, 2002 at 03:25 UTC

    IMHO, adding XML certainly will not hurt you. Using XML will, most likely, make your life easier. Especially in the vent of adding a new databit like "compress".

    For example, you have a piece of markup that looks like this (I'm making this up as I go along):

    <server type="1"> <log loc="D:\logs\server.log" app="CF_MX" /> <features> <compress enabled="true" /> <images strip-on-log="false" /> </features> <start-time>[ARIBTRARY TIME STAMP]</start-time> </server>

    With split() and flat files, you would have to change the application if you wanted to add new functionality. However, with XML you can just add a new tag. Your code can/should be written to ignore tags it doesn't understand. You can gain a great deal of flexibility this way.

    Say, for some reason, you have this set-up in place on all your servers, and the log creating engine is able to be maintained and kept up to date on all the boxes. BUT, on box XYZ, you cannot update the logfile parser (gremlins have begun nesting in the code). With XML logfiles, you don't need to be as concerned that the parser on XYZ can't read the new and improved XML, it just won't pay any attention to it. It's kinda like putting stupid things in your HTML like <p lemon="fruit">, it might look dumb, but the parser won't pay any attention.

    If you make something idiot-proof, eventually someone will make a better idiot.
    I am that better idiot.

Re: To use XML or not to use XML
by pg (Canon) on Nov 26, 2002 at 04:07 UTC
    If you decided to store your data in a flat file (I mean not database etc.), then XML is the best choice. There are couple of advantages I can see:
    • The magic of standard. XML is backed up by industry standards, and the methodologies to handle and process XML data, such as SAX and DOM, are also backed up by industry standands. You can expect lots of investment, including people and capital, being put into XML, and this determines that, XML related technologies will be evolving more quickly. You can expect your data could be easily accepted by lots of applications.
    • Open up the door to future. It is reasonable for you to expect that XML related technology will be there for a long time. No technology will be there forever, especially in the IT industry, but new technologies, technologies that are still evolving but already have been largely accepted, like XML, would exist longer. (We are actually looking at the cost efficiency here. Less or same investment, but be there longer.)
    • More easy to share data in a distributed environment. You can easily utilize things like SOAP, if your data is XML formated.
    • Easy to validate. There are standard ways to validate your XML data, such like DTD, XML Schema etc. (Less development cost)
    • Easy to transform your data into different format. We have things like XSLT etc. (Again less development cost.)
    (To directly answer one of your questions above, if you want to modify the data, then you have to choose DOM over SAX. XSLT is also a good choice.) I don't want be one-sided. Let's also look at the disadvantages:
    • XML is an evolving technology, when this is an advantage, it is also an disadvantage. When you can expect XML and its related technologies be there for a long time, in a macro view. In a micro view, you will see new versions of things come and go quickly, today's new XML-related technologies might be obsoleted by other even newer ones very quickly. However the foundation will be pretty stable.
    • If the amount of data you will be dealing with, is huge. You can expect some sort of performance issue, because of all the time spent on encoding/decoding. However it is the expectation that, more efficient ways of handling XML data is coming.
Re: To use XML or not to use XML
by Jenda (Abbot) on Nov 26, 2002 at 22:34 UTC

    I would use an INI-file-like format and one of the CPAN modules to read it.

    [srv1_FirstWeb] server=10.10.10.10 path=D:\logfiles\w3svc3 format=www other_option=dfhwibfoserbfiuybl ...

    It's easier to write by hand than XML and more than sufficient for this task.

    And if you ever find out that you need the multilevelness of XML you can read the INI into a HoH, write it out as an XML and later read it with XML::Simple. And if you used a module that reads the INI file as a HoH you do not have to change your code much.

    Jenda