Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I wrote a CGI that takes input in XML format. It works fine, but when the input becomes large (~25MB), it's causing timeout problems. Is there an option to pass to XMLin which causes it to spew a lot of messages on screen as it's parsing the input?

Replies are listed 'Best First'.
Re: XML::Simple noisy option?
by BrowserUk (Patriarch) on Jan 30, 2003 at 00:59 UTC

    Update:Mirod points out that this trick will only work if your version of XML::Simple is using XML::Parser as it's underlying parser. If you have/are using SAX as the underlying Parser - ignore this:). Thanks for the heads-up Mirod++.

    XML::Simple has a parseropts option which allows you to pass XML::Parser options through to the XML::Parser instantiated by XML::Simple. One of the XML::Parser options is Handlers=> [ Event=>\&handler, ...].

    By using this to register your own handlers you can get your own code called--with salient information as to what is being processed--as XML::Parser processes your document.

    The problem is that doing so overrides the XML::Parser::Tree style handlers used by XML::Simple. However, these handlers conform to a standard naming convention and it is easy to use perl's goto &func; form of goto, to invoke these standard handlers from your own once you have finished tracing the info.

    The PoC code below produces this output while successful parsing the emebedded example XML.

    c:\test>231115 Init: Start: config logdir /var/log/foo/ debugfile /tmp/foo.debug Char: Char: Start: server name sahara osname solaris osversion 2.6 Char: Char: Start: address Char: 10.0.0.101 End: address Char: Char: Start: address Char: 10.0.1.101 End: address Char: Char: End: server Char: Char: Start: server name gobi osname irix osversion 6.5 Char: Char: Start: address Char: 10.0.0.102 End: address Char: Char: End: server Char: Char: Start: server name kalahari osname linux osversion 2.0.34 Char: Char: Start: address Char: 10.0.0.103 End: address Char: Char: Start: address Char: 10.0.1.103 End: address Char: Char: End: server Char: End: config Final: c:\test>

    The code: (Update:simplified option setting code)

    For more information on what handlers you can override, and what information is passed to them see the XML::Parser pod. (or source).

    I might make a package out of this idea if anyone is interested.


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: XML::Simple noisy option?
by BazB (Priest) on Jan 29, 2003 at 23:42 UTC

    You might want to consider using the SAX events that XML::Simple (and many other XML parsers) can generate.

    Rather than process the entire XML document in memory, the parser will generate events (for example the start or end of an element) which can then be used to trigger a handler, which can:

    • process that particular element or section
    • run code unrelated to the XML processing
    • print a status message
    • anything else you fancy

    As well as keeping resource usage down, you can print the new document to the browser as you process it.

    Cheers.

    BazB

    Update: Added a few suggestions of what SAX events could be used to trigger.

    Update 2: mirod++. I've not been able to confirm XML::Simple's behaviour regarding having to slurp the whole file before producing SAX events, but I'll bow to mirod's superior knowledge in this area :-) mirod's idea of using a SAX filter is cunning.


    If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
    That way everyone learns.

      I don't think this would work. I haven't read that part of the code, but I believe you still have to slurp the entire document in memory with XMLin and process it before you start emitting SAX events using XMLout.

      What could work is using a SAX parser, and setting up a SAX filter that would spur out messages before passing the data untouched to XML::Simple. Or even using a tee like XML::Filter::Tee so that both XML::Simple and a separate SAX handler receive the input. That handler can then spit out a message depending on where it is in the parsing.

      OTOH playing with signals to send an "everything is OK" message every x second might be quite simpler ;--)

Re: XML::Simple noisy option?
by pg (Canon) on Jan 30, 2003 at 03:26 UTC
    I didn't read the source code of XML::Simple and all those related, but theoretically no SAX parser should slurp a whole file before it start parsing, if it does, it is really poorly designed, and should not consider to use it at all.

    The design objective of SAX is to parse XML data as a stream on fly. Remember the data source can be a TCP port, SAX would parse the XML stream when data is still coming.

    As a design cosideration, it would be terribly inefficient, not to use the idle time in between TCP communication. Terrible.

    On the other hand, remember, even with all the best design consideration, XML parsing is still very slow. This is not something you can escape at this stage of XML's life cycle.

    Update:

    After received serveral msg's from mirod. I realized that this is some orange/apple thing.

    1. XML::Simple does not "slurp" the file in order to use SAX parser. When I say "slurp", I mean to read in the whole file BEFORE parsing. This is my version of "slurp".
    2. XML::Simple would eventually store the WHOLE result it got back from the parser in memory, so eventually the file is "slurped". That's his version of "slurp".

      I understood that XML::Simple did that, that's okay with me and that's the purpose of XML::Simple, but that's not my "slurp" ;-)
    A person finished 9 scoops of ice cream in 1 minute, I will say he slurped it. Another person finished 9 scoops of ice cream in 1 hour, yes, the ice cream eventually all stored in his gut ;-) but I don't call that slurp.

    I think we down voted each other, that's fine and that's part of the life here ;-). Couple of XP points is much less important than to make the facts straight. In this sense, we both did well.

      XML::Simple is not a SAX parser. It is a module that lets you load XML data into a Perl structure. It works on top of a parser, which can be XML::Parser or, since versions 1.08_01, any SAX parser. So by (good) design it slurps a whole XML file into memory.

      Not only did you not read the code, it looks like you did not read the documentation either ;--)

        "So by (good) design it slurps a whole XML file into memory."

        Okay, now I traced each line of the source code. I think we are lucky that the XML::Simple module didn't follow your good design and slurp the file.

        Slurping a whole file, in order to use a SAX::Parser? Good he/she didn't do it.

        To be frank with you, the point is really not whether XML::Simple slurp the whole file or not, that's not something you or me can change, it was just done in that way, but it is a very poor judgement to praise "slurp a whole file" as a good design in this context.