dHarry has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I am doing a small proof of concept to see if XML::Twig will work for me, i.e. process large XML files in a reasonable amount of time. I use the script below to generate simple XML files for testing. I vary the value for $num_departments to obtain different sizes. A value of 1 mio produces an XML file of about 688 MB.
use strict; use warnings; my $file_name = "dharry.xml"; my $num_departments = 1000; open (XML_OUT_FILE, ">$file_name") or die "Could not open $file_name\n"; print XML_OUT_FILE "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"; print XML_OUT_FILE "<Company>\n"; for (my $i=0; $i<$num_departments; $i++) { print XML_OUT_FILE " <Department>\n"; print XML_OUT_FILE " <Name>Bla$i</Name>\n"; for (my $j=0; $j<5; $j++) { print XML_OUT_FILE " <Person id=\"$i$j\">\n"; print XML_OUT_FILE " <First>John$i$j</First>\n"; print XML_OUT_FILE " <Last>Doe$i$j</Last>\n"; my $phone_ext = int(rand(10000000)); print XML_OUT_FILE " <PhoneExt>$phone_ext</PhoneExt>\n" +; print XML_OUT_FILE " </Person>\n"; } print XML_OUT_FILE " </Department>\n"; } print XML_OUT_FILE "</Company>\n"; close XML_OUT_FILE or die "Could not close $file_name\n"; print "Done...\n";
I have run many different tests, this is one of them that fails. I attempt to do a smart update: one Twig only, based on a specific value of the id attribute on the Person element.
use strict; use warnings; use XML::Twig; # Select Twig bassed on value of id attribute on Person element my $twig= new XML::Twig( twig_handlers => { 'Person[@id="50000"]' => \&Person } ); $twig->set_pretty_print ('record'); # Human readable output please $twig->parsefile( "dharry.xml"); $twig->flush; sub Person { my( $twig, $person)= @_; my $name = $person->first_child("First"); $name->set_text("dHarry"); $twig->flush; }
Results
| $num_departments | xml file size before | xml file size after | Time usage |
| 1000 | 657 KB | 708 KB | seconds |
| 10_000 | 6.57 MB | 7.07 MB | 1 minute |
| 100_000 | 67.2 MB | n/a | n/a |
I was a bit surprised by the crashing of the program. I tried different xpath expressions and rerun the test. Sometimes the 67.2 MB file was processed successfully but bigger files could not be handled. Note that the resulting xml files get a bit bigger because of the pretty_print option. Any ideas why it isn’t working? Is my code wrong?
NB
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Putting XML::Twig to the test
by mirod (Canon) on Aug 18, 2008 at 12:58 UTC | |
by dHarry (Abbot) on Aug 18, 2008 at 13:56 UTC |