Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

writing the data structure into a XML file (with high performance)

by kunal.sharma (Initiate)
on Apr 20, 2011 at 10:00 UTC ( [id://900301]=perlquestion: print w/replies, xml ) Need Help??

kunal.sharma has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks
i am a newbie to perl
I have one requirement in which i need to process data from the database and then want to write it in a XML file.

- processed data will be in a data structure (hash ref, array ref etc)
- size of data is hugh
- want the best performamce while writing this data into xml file
- do not want to read this xml file back in a data structure
- As i am creating this application from scratch i can design the structure of my processed data keeping in mind of the good writing performance
- below is an example of the output xml file

Sample output XML file
<EMPLOYEES> <EMPLOYEE ID="1"> <NAME>John Wright</NAME> <AGE>45</AGE> <DEPARTMENT>Sales</DEPARTMENT> </EMPLOYEE> <EMPLOYEE ID="2"> <NAME>Peter Rob</NAME> <AGE>30</AGE> <DEPARTMENT>HR</DEPARTMENT> </EMPLOYEE> </EMPLOYEES>

As there are many XML modules to do this task and need your expert advice for this problem
XML::Simple
XML::Twig
XML::Bare etc
which one to use?

Replies are listed 'Best First'.
Re: writing the data structure into a XML file (with high performance)
by anonymized user 468275 (Curate) on Apr 20, 2011 at 11:16 UTC
    To get absolute best performance is no easy task, requiring home-grown routines that combine generic traversal of a data structure with code generation.

    If you can cope with the performance loss of using modules, then XML::Simple looks reasonable enough, although you still can't escape the formal requirement to translate your internal data structure into XML tag structure. It might be worth expressing your data structure in something like BNF as if it were a language and then writing a specific traverser that can context-sensitively call the code generation routines either in a module or in your own generation routines.

    One world, one people

Re: writing the data structure into a XML file (with high performance)
by eff_i_g (Curate) on Apr 20, 2011 at 14:06 UTC
    Can the database return XML? Even though I love XML::Twig I assume it won't be the fastest because it's beefy; however, I could be wrong. How about some Benchmarks?

      It's not the fastest, but not because (just?) it's beefy. It's because it uses a relatively slow parser. One of the beefiest, XML::LibXML, is also one of the fastest.

      XML::Bare is tiny bit faster than XML::LibXML, but it has so many limitations. XML::Fast is suppose to be a bit faster yet, but I expect it has the same limitations.

Re: writing the data structure into a XML file (with high performance)
by sundialsvc4 (Abbot) on Apr 20, 2011 at 14:23 UTC

    In my humble, the best metric of how “fast and efficient” a particular package is, is determined by how easily and effectively it enables you to do what you want.   Never mind how “quickly” the CPU can move data around in memory, or how well the filesystem can do its job.   How fast can you get your programs written?   How much of the “heavy lifting” can you palm off to this-or-that Perl package?

    “Huge” is a very relative term these days.   The laptop that I am using right now has four CPU cores, 6 gigabytes of RAM, and it has 1.5 terabytes of disk storage attached to it.   Is that “huge?”   It’s rendering a video while I’m talking, and its heart’s not skipping a beat.   Is that “slow,” or “fast?”   All I know is, when I set down to program the thing, that will take a very long time.   Fixing or rewriting a program takes even longer, and a lot more coffee.

    I personally use XML::Twig most often, because it is, as you say, “beefy,” and therefore I know that it will pretty much accept any file out there and do anything I need to do with it.   But that isn’t speaking against, e.g. XML::Simple in any way whatever.   What you don’t want to do is to get partway into your project, using tool “A,” only to discover that there is a problem that you have, which the tool you selected cannot easily solve.   You don’t want to find yourself writing code to “make up the shortfall.”

    So, before you make any selection, do some tests.   I actually have used Test::More techniques to quickly cobble-together some exercises that I can then throw at various example files, just to be sure that the tools and techniques that I’m planning to use will work, when the scenario is stripped down to bare-bones.   It is actually a very good way to do “proof of concept” experiments.

    “Look before you leap.”

    “Trust, but verify.”

      Bicycles are simple, cheap, efficient and environmentally friendly.

      But if your requirement is to commute 200 miles daily, and you ask what is the quickest way, such advice is entirely unhelpful.

        The fastest way to reach my railway station is by bike.

        The OP didn't tell us anything about how many miles he has to commute per day.

        Anything else is speculation.

        Cheers Rolf

Re: writing the data structure into a XML file (with high performance)
by runrig (Abbot) on Apr 20, 2011 at 17:27 UTC

    My guess is that getting the data from the database will be the bottleneck, and the time it takes to massage the data into XML will be insignificant compared to the time it takes to fetch the data from the database.

    You don't say what database you're using, but if there's some bulk export command, you might be able to export the data into a flat file quickly, then massage the flat file into XML. Or export the data into a pipe, and have the process on the other end of the pipe massage the data into XML (but I've only seen one database that had this sort of bulk export capability).

Re: writing the data structure into a XML file (with high performance)
by spx2 (Deacon) on Apr 20, 2011 at 16:37 UTC

    If I had to do this task I'd check out all the modules to see if they have the features I need, then I'd take the ones that do have the features and benchmark them.

    If you still don't find anything suited for your task you need to be bold as brass and write yourself the needed tool.

    History shows that a stage in man's evolution was Homo Faber when men built tools to control their environment, you can do that too. And then you can put it on CPAN.

Re: writing the data structure into a XML file (with high performance)
by sundialsvc4 (Abbot) on Apr 21, 2011 at 00:57 UTC

    If the XML data is huge (or the resulting file will be huge...) then XML::Twig is particularly designed with this capability in mind.   It can process files in “chunks,” so to speak, in order to effectively deal with XML structures that are expected to be “too big to fit nicely in-memory.”   Just FYI ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://900301]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-03-29 01:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found