comment on

What about using a different module?

#!/bin/perl
use strict;
use warnings;

use XML::Rules;

my $parser = XML::Rules->new(
    start_rules => [
        file => sub {
            my ($tag_name, $attrs, $context, $parent_data, $parser) = 
+@_;
            $parser->{pad}{file} = $attrs->{original};
            $parser->{pad}{file} =~ s{game/stringtable/}{}i;
        },
    ],
    rules => [
        source => 'content',
        target => 'content',
        _default => '',
        'trans-unit' => sub {
            my ($tag_name, $attrs, $context, $parent_data, $parser) = 
+@_;
            print EXTR qq{"$parser->{pad}{file}","$attrs->{id}","$attr
+s->{source}","$attrs->{target}"\n};
            return;
        },
    ],
);

open(EXTR, ">meep.csv") or die $!;
$parser->parsefile( "meep.xlf");
[download]

Using this you may handle hundreds of megabytes large XMLs of this structure as this 1) stores only what it needs to store of the subtags of <trans-unit> and 2) as soon as it's done with a <trans-unit> tag it forgets all its data.

What the script does is that when the parser parses the opening tag of <file> the script tweaks and remembers the original attribute in a "pad" - an attribute of the parser object designated to hold the script specific data. Then whenever it parses the complete <source> or <target> tag it remembers just the content and makes it readily available in the atribute hash of the parent tag and then whenever it parses the complete <trans-unit> tag it prints the remembered file name, the id attribute and the contents of the <source> and <target> subtags. And forgets the data of that tag.

Here is a version without using the pad and using a lexical filehandle:

#!/bin/perl
use strict;
use warnings;

use XML::Rules;

my $parser = XML::Rules->new(
    start_rules => [
        file => sub {
            my ($tag_name, $attrs) = @_;
            $attrs->{original} =~ s{game/stringtable/}{}i;
            return 1;
        },
    ],
    rules => [
        'trans-unit' => sub {
            my ($tag_name, $attrs, $context, $parent_data, $parser) = 
+@_;
            my $file = $parent_data->[-2]{original};
            print {$parser->{parameters}{FH}} qq{"$file","$attrs->{id}
+","$attrs->{source}","$attrs->{target}"\n};
            # or
            # my $FH = $parser->{parameters}{FH};
            #print $FH qq{"$file","$attrs->{id}","$attrs->{source}","$
+attrs->{target}"\n};
            return;
        },
        source => 'content',
        target => 'content',
        _default => '',
    ],
);

open(my $EXTR, ">meep.csv") or die $!;
$parser->parsefile( "meep.xlf", {FH => $EXTR});
close $EXTR;
[download]

Jenda
Support Denmark!
Defend the free world!

In reply to Re: sort order of imported xml data? by Jenda
in thread sort order of imported xml data? by Xenofur

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.