donp has asked for the wisdom of the Perl Monks concerning the following question:
Hallo PerlMonks,
Just found perlmonks.org when googling for a solution to a special problem. I'm not an experienced Perl programmer and english is not my native language, so please have mercy on me...
My Problem: I've got large XML files (about 10MB each) which I'd like to split into smaller chunks and later merge back to big ones.
xml_split fom module XML::Twig does a good job when used with option -c (condition), but due to the structure of some of my XML files this sometimes gives me thousands of small chunks, which is far too much.
Options -s (chunk size) or -g (group certain tags) seem to be the best choice, but unfortunately all text nodes are then lost after xml_split is finished with my data. Just XML tags are left over.
The reason for that seems to be that xml_split does not use XML::Twig when options -s and -g are active. Instead it uses XML::Parser directly. In this case, XML::Parser neither calls text handlers nor default handlers, but why? It seems to just delete or skip all text for some strange reason.
I can hardly believe that noone else has stumbled upon or even solved this problem so far.
So: Does anyone know about a solution for this or perhaps about some different XML splitting tool/module I could use? I've been searching for quite a while but only found that "xml_split".
Thanks for any help,
donp
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Tool "xml_split" from XML::Twig removes all text
by mirod (Canon) on May 30, 2008 at 17:36 UTC | |
|
Re: Tool "xml_split" from XML::Twig removes all text
by Zen (Deacon) on May 30, 2008 at 21:50 UTC | |
|
Re: Tool "xml_split" from XML::Twig removes all text
by jurple (Initiate) on Sep 08, 2010 at 10:06 UTC | |
by Anonymous Monk on Apr 28, 2011 at 08:11 UTC | |
by mirod (Canon) on Apr 28, 2011 at 13:17 UTC | |
by toolic (Bishop) on Apr 28, 2011 at 12:42 UTC |