Re: XML parsing question

use strict;
use XML::Rules;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new ();
my $parser = XML::Rules->new(
    stripspaces => 7,
    rules => {
        _default => 'content',
        record => sub {
            my ($tag,$attr,$context,$parent) = @_;
            $csv->combine (
                $parent->[-1]{datetime},
                $parent->[-1]{id},
                $parent->[-1]{sourcecategory},
                $parent->[-1]{schemeversion},
                $attr->{'sentence-number'},
                $attr->{'data-class'},
                $attr->{'group'},
            ) and print $csv->string(),"\n" or die "Error building the
+ CSV line: ".$csv->error_input()."\n";
            return;
        },
        '^document' => sub {
            my ($tag,$attr) = @_;
            $csv->combine (
                $attr->{datetime},
                $attr->{id},
                $attr->{sourcecategory},
                $attr->{schemeversion},
            ) and print $csv->string(),"\n" or die "Error building the
+ CSV line: ".$csv->error_input()."\n";
            return 1;
        },
        'document' => '', # do not want to remember any data
    }
);

$parser->parse(\*DATA);

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<results>
<document id="\2006\200601\20060125\20060125_18.txt"
            datetime="2006/01/25"
            sourcecategory="News Archive"
            schemeversion="1.1">
...
[download]

The nice thing is that this works even if the XML is huge as it doesn't keep the whole document in memory. Rather it only remembers the attributes of a single <document> and the contents of one <record>.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Comment on Re: XML parsing question Download Code