Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks, i am having a problem on transforming huge XML files. I'm using Sablotron in order to apply the xsl template. Things work excellent until the time that i was trying to parse a 42MB XML. Memory, limitations, almost 30 mins of processing etc... Looking around, found many probs with the XML parser, which holds to memory ALL the xml . Cant use the Twig feature (with Sablotron) in order to work on a smaller xml. The problem that i need to solve (and asking your experience for) is how to eliminate the memory limitations or how to xslt huge xmls with Sablotron as faster as possible or... I am sending you this because a collegue of mine, found out a vb.net solution ----- WHAT??? Perl didnt work better? Help please

Replies are listed 'Best First'.
Re: XSLT processing huge XMLs
by ajt (Prior) on Feb 07, 2005 at 11:44 UTC
      Is there any alternative to extent Sablotron ?... Needs to overcome the .Net solution that translates the xml in few secs... That means to work exclusively in win32. For small xmls Sablotron is excellent but considering huge xmls.. Dont know, maybe something to parse the xml in chunks translate those sequentialy (with xsl)and adding them in the final xml version of the xslt. Is there any hinds for that approach? Thanks for your concer ajt.

        XML::LibXSLT will install okay on Win32, you can build all the bits yourself, use the Cygwin version of Perl or install a pre-compiled binary, see below for more details:

        While you are here you may wish to join the Monastery, which will improve your experience of the place by enabling extra features.

        Goood Luck,


        --
        ajt
Re: XSLT processing huge XMLs
by dakkar (Hermit) on Feb 07, 2005 at 18:04 UTC

    You say:

    found many probs with the XML parser, which holds to memory ALL the xml

    Well, it's supposed to do that... to be able to use XSLT, you need the entire DOM (Document Object Model) in RAM, since XSLT allows random access to every part of the document.

    About memory occupation: at the least, every element occupies tha space for its name, name and value of each attribute, and a couple of pointers (to parent and first child, for example). Meaning that a properly packed DOM can occupy a bit less space than the file it was parsed from. No implementation I know does it this packed, however: in order to be faster, usually. Yau should anyway see an occupation of less than twice the file size.

    About the .NET solution: is it using XSLT, or munging the data directly? XSLT is not a really optimizable language, and implementations tend to be rather slow (even in C).

    -- 
            dakkar - Mobilis in mobile
    

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)

      If a given XSLT-program does not require access to the whole tree, e.g. by only using information in the current node, it is possible to do streamline processing where only the relevant part of the XML-file is kept in memory, and information is generated as soon as possible. I believe that the original XT did this, and that Xalan can do this for simple cases, but I have not researched on this for some time.
      ...., or munging the data directly? Ok. I think (cause of its quickness) that mungs data directly How can we do that with perl?
Re: XSLT processing huge XMLs
by inman (Curate) on Feb 08, 2005 at 09:59 UTC
    Can you post a sample of the XML that you are working with and a description of the task that you are trying to achieve? It may be the case that you could divide the XML into smaller chunks before transforming with XSLT.