in reply to Ugly XML processing looking for a pure XML solution

Assuming that your input XML isn't much more complicated than the above fragment (and even if it is, this can still work), a simple XML::Parser will do the trick.

#!/usr/bin/perl -w use strict; use XML::Parser; my $parser = new XML::Parser( Style => "::localParser" ); print $parser->parsefile( $ARGV[0] ); ######################################################### ######################################################### parser ######################################################### package localParser; use strict; # # called when the parser starts # sub Init { my $self = shift(); $self->{people} = []; $self->{officers} = []; $self->{text} = ''; } sub Start { my ($self, $element, %attr) = @_; $self->{text} = ''; # clear text to be sure } sub Char { my ($self, $string) = @_; $self->{text} .= $string; # append string to text } sub End { my ($self, $element) = @_; # save string in proper # category. if you have # more complicated data # (i.e. a simple array # won't do), you'll have + # to do something more + # clever if ( $element eq 'Ch_Chair' ) { push( @{$self->{officers}}, $self->{text} ); } elsif ( $element eq 'CommitteeList' ) { push( @{$self->{people}}, $self->{text} ); } $self->{text} = ''; } # # the final output # sub Final { my $self = shift(); my $officers = join("\n\t\t\t", map{ "<person>$_</person>" } @{$self->{officers}}); my $people = join("\n\t\t\t", map{ "<person>$_</person>" } @{$self->{people}}); return <<__HERE__ <doc> <perslist> <officers> $officers </officers> $people </perslist> </doc> __HERE__ } 1;

So essentially what I'm doing is parsing the data into some intermediate data structure and then outputting that as XML. I'm sure that there are other modules on CPAN that'll help you output valid XML based on some more complicated data structure, rather than this simple collection of two arrays :)

I'm not sure if this qualifies as a "pure xml solution" though!

Replies are listed 'Best First'.
Re: Re: Ugly XML processing looking for a pure XML solution
by mirod (Canon) on Dec 15, 2000 at 13:38 UTC

    Something like this could certainly works, although it would be more complex as the document is actually more complex and and I have about 30 wrapping rules, so I would not be able to wait for the end of the parsing to output the officers and persons. But see how long your solution is? How much job it is for each rule, and you have to write another piece of code for each different rule, or at least each different type of pattern. And my real transformation table has rules such as:

      stdtitle => 'stddes*, stddesmo?, reaf?, stdcoll?, titlemod?, revision?, title+'

    With a solution like yours I would have to simulate (baddly) the regexp engine, while with the code as it stands I just have to add one line to the %wrap table (and an item in the @wraparray) and... voila!. I get a good chunk of regexps for free

    So your solution qualifies as "pure XML", but fails to be a generic one, while mine is not XML (and thus dangerous), but generic, and I am still searching for my Holy Graal of a generic XML solution (which should have been the title of my first post now that I think about it)...

    Your code uses XML::Parser in a very clean way though, witing your own style and storing parser related data (the text, people and officers fields) with the parser. Neat!