What about using a different module?

#!/bin/perl use strict; use warnings; use XML::Rules; my $parser = XML::Rules->new( start_rules => [ file => sub { my ($tag_name, $attrs, $context, $parent_data, $parser) = +@_; $parser->{pad}{file} = $attrs->{original}; $parser->{pad}{file} =~ s{game/stringtable/}{}i; }, ], rules => [ source => 'content', target => 'content', _default => '', 'trans-unit' => sub { my ($tag_name, $attrs, $context, $parent_data, $parser) = +@_; print EXTR qq{"$parser->{pad}{file}","$attrs->{id}","$attr +s->{source}","$attrs->{target}"\n}; return; }, ], ); open(EXTR, ">meep.csv") or die $!; $parser->parsefile( "meep.xlf");
Using this you may handle hundreds of megabytes large XMLs of this structure as this 1) stores only what it needs to store of the subtags of <trans-unit> and 2) as soon as it's done with a <trans-unit> tag it forgets all its data.

What the script does is that when the parser parses the opening tag of <file> the script tweaks and remembers the original attribute in a "pad" - an attribute of the parser object designated to hold the script specific data. Then whenever it parses the complete <source> or <target> tag it remembers just the content and makes it readily available in the atribute hash of the parent tag and then whenever it parses the complete <trans-unit> tag it prints the remembered file name, the id attribute and the contents of the <source> and <target> subtags. And forgets the data of that tag.

Here is a version without using the pad and using a lexical filehandle:

#!/bin/perl use strict; use warnings; use XML::Rules; my $parser = XML::Rules->new( start_rules => [ file => sub { my ($tag_name, $attrs) = @_; $attrs->{original} =~ s{game/stringtable/}{}i; return 1; }, ], rules => [ 'trans-unit' => sub { my ($tag_name, $attrs, $context, $parent_data, $parser) = +@_; my $file = $parent_data->[-2]{original}; print {$parser->{parameters}{FH}} qq{"$file","$attrs->{id} +","$attrs->{source}","$attrs->{target}"\n}; # or # my $FH = $parser->{parameters}{FH}; #print $FH qq{"$file","$attrs->{id}","$attrs->{source}","$ +attrs->{target}"\n}; return; }, source => 'content', target => 'content', _default => '', ], ); open(my $EXTR, ">meep.csv") or die $!; $parser->parsefile( "meep.xlf", {FH => $EXTR}); close $EXTR;


In reply to Re: sort order of imported xml data? by Jenda
in thread sort order of imported xml data? by Xenofur

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.