coreolyn has asked for the wisdom of the Perl Monks concerning the following question:
Venerable Bretheren ( & Bretherenesses ),
I've been able to avoid DOM, SAX and XML in general and it has now caught up to me. After reading the available information it looks like XML::Twig is the right tool for what I'm attempting to do, but that was after some quick reading. (Unkown file lengths make potential memory problems for XML::Simple and XML::Twig just looks cleaner than XML::Parser.)
I need to create a script that can dynamically change the value of elements in a well formed XML file. A calling program will supply the name of the element to search for and the value to supply to it.
I had thought this should be a piece of cake, but I'm paying a bit of a price for avoiding XML. I can do this via a regex, but as this is a script that is liable to grow in it's functionality I figure I need to stop and get a clue how to do this right.
I've stolen this snippet from a node of mirods and it pretty much exposes my ignorance. mirod's original node is a replace of XML via an XML file, I've started to re-arange variable names and eliminated the updated XML file as I'm attempting to bring in the vars from an external program:
#!/bin/perl -w
use strict;
use XML::Twig;
my( $main_file, $search, $value )= @ARGV;
# get the info we need by loading the update file
#my $t_upd= new XML::Twig();
#$t_upd->parsefile( $upd_file);
#my $upd_badge_id = $t_upd->root->next_elt( 'badge_id')->text;
#my $upd_chore = $t_upd->root->next_elt( 'jobs');
# Process the main file
my $orig = new XML::Twig(
TwigHandlers => { $search => \&search, },
PrettyPrint => 'indented',
);
$orig->parsefile( $main_file );
$orig->flush; # don't forget or the last closing tags won't
+be printed
sub search {
my( $orig, $search )= @_;
print "hrmmm\n";
# just replace jobs if the previous badge_id is the right one
if( $search->prev_elt( 'name' )->text eq $search ) {
print "hrmmm\n";
$orig->replace( $value );
}
$orig->flush; # print and flush memory so only one job is in th
+ere at once
}
coreolyn - exposure of ignorace is so eeew ya know?
(jeffa) Re: XML Search and Replace
by jeffa (Bishop) on Jun 11, 2002 at 16:00 UTC
|
I will let mirod handle the XML::Twig version, as i have
not progressed to that module yet. I recently bought
Perl & XML and am enjoying it
immensely. Here is an 'event stream'
version that uses XML::Parser
and XML::Writer to replace all <foo> elements
with the element <struggle>. Input is from the DATA
file handle and output is STDOUT:
use strict;
use XML::Parser;
use XML::Writer;
my $writer = XML::Writer->new();
my $parser = XML::Parser->new(
Handlers => {
Init => \&handle_Init,
Start => \&handle_Start,
Char => \&handle_Char,
End => \&handle_End,
Final => \&handle_Final,
}
);
# i could have also made these $parser attributes
# such as $parser->{from} and $parser->{to}
our $from = 'foo';
our $to = 'struggle';
my $data = do {local $/;<DATA>};
$parser->parse($data);
# called once at the beginning
sub handle_Init {
$writer->xmlDecl('UTF-8');
$writer->doctype('xml');
}
# called each time a start element is encountered
sub handle_Start {
my($self,$name,%atts) = @_;
$name = $to if $name eq $from;
$writer->startTag($name,%atts);
}
# called each time non-markup data is encountered
sub handle_Char {
my($self,$text) = @_;
$writer->characters($text);
}
# called each time an end element is encountered
sub handle_End {
my($self,$name) = @_;
$name = $to if $name eq $from;
$writer->endTag($name);
}
# called once at the end of the document
sub handle_Final {
$writer->end();
}
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<xml>
<foo class="life or death">
<opponent>wolf</opponent>
<opponent>ant</opponent>
</foo>
<foo class="life or death">
<opponent>pantomime goose</opponent>
<opponent>Terrance Rattigan</opponent>
</foo>
</xml>
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [Watch: Dir/Any] [d/l] |
|
This code helps a lot for understanding XML::Parser, but it also shows how I've failed to communicate what I'm attempting to do.
To illustrate via your example: I'm looking to search for the 'opponent' element and change the value of it's text.
coreolyn .. me thinks I'll be buying Perl & XML shortly.
| [reply] [Watch: Dir/Any] |
|
Here is another way, using XML::Simple, which changes a few of the opponents, like you wanted.
use strict;
use XML::Simple;
my @data = (<DATA>);
my $xml = XMLin((join'', @data));
foreach my $foo (@{$xml->{'foo'}})
{
foreach my $opponent (@{$foo->{'opponent'}})
{
if($opponent eq 'wolf')
{
$opponent = 'Heinz Sielmann';
}
elsif($opponent eq 'ant')
{
$opponent = 'Peter Scott';
}
}
}
print XMLout($xml, rootname => 'xml');
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<xml>
<foo class="life or death">
<opponent>wolf</opponent>
<opponent>ant</opponent>
</foo>
<foo class="life or death">
<opponent>pantomime goose</opponent>
<opponent>Terrance Rattigan</opponent>
</foo>
</xml>
I don't usually code a lot of XML, but when I need to, I find that XML::Simple (together with Data::Dumper) often lets me do simple stuff really quickly. Of course, it takes a little tounge-in-cheek for the dereferencing sometimes, see references quick reference for an excellent tutorial on this. :)
You have moved into a dark place.
It is pitch black. You are likely to be eaten by a wolf. | [reply] [Watch: Dir/Any] [d/l] |
|
|
Had to run to lunch ... here is another version that
DWYW ;)
use strict;
use XML::Parser;
use XML::Writer;
my $writer = XML::Writer->new();
my $parser = XML::Parser->new(
Handlers => {
Init => \&handle_Init,
Start => \&handle_Start,
Char => \&handle_Char,
End => \&handle_End,
Final => \&handle_Final,
}
);
my $data = do {local $/;<DATA>};
$parser->{match} = 'opponent';
$parser->parse($data);
sub handle_Init {
$writer->xmlDecl('UTF-8');
$writer->doctype('xml');
}
sub handle_Start {
my($self,$name,%atts) = @_;
$self->{flag} = 1 if $name eq $self->{match};
$writer->startTag($name,%atts);
}
sub handle_Char {
my($self,$text) = @_;
if ($self->{flag}) {
if ($text eq 'Terrance Rattigan') {
$text = 'breakfast';
}
else {
$text =~ s/goose/Queen Elizabeth/;
}
delete $self->{flag};
}
$writer->characters($text);
}
sub handle_End {
my($self,$name) = @_;
$writer->endTag($name);
}
sub handle_Final {
$writer->end();
}
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<xml>
<struggle class="life or death">
<opponent>wolf</opponent>
<opponent>ant</opponent>
</struggle>
<struggle class="life or death">
<opponent>pantomime goose</opponent>
<opponent>Terrance Rattigan</opponent>
</struggle>
</xml>
If a start element named 'opponent' is found, we set a
flag - why not use the parser's namespace? ;) Next, each
time a non-markup character is encountered, we see if the
flag is set and if it is, do some conversions and erase
the flag.
Most of my XML munging (until recently) has been with
XML::Simple. That module builds an internal tree that
represents the document. As Dog and Pony
showed you, it
is a really easy module to work with, but as the document
you are munging gets larger, XML::Simple gets slower and
takes up more memory.
These two versions i supplied use XML::Parser to take
advantage of 'event streams', they are more economical
in speed and memory. But they are also more complicated,
as you can immediately tell by comparing my code with
Dog and Pony's.
jeffa
"Here we see a life and death stuggle between jeffa
and Dog and Pony ..." ;) | [reply] [Watch: Dir/Any] [d/l] |
Re: XML Search and Replace
by mirod (Canon) on Jun 11, 2002 at 16:42 UTC
|
I don't really understand what you are trying to do, but I think you have a problem with how you use the arguments passed to the handler (search). The handler receives 2 arguments: the twig ($orig in this case) and the current element ($search). So really you can't write $search-$gt;prev_elt( 'name' )->text eq $search, search is an XML::Twig::Elt object, not a string. Then replace is a method on an element, not on a twig, so you probably don't want to write $orig->replace( $value );.
Would this work? (I can't test it without the actual XML data) It should work provided the text of the name element before the $search element is $search, which seems a bit odd to me.
#!/bin/perl -w
use strict;
use XML::Twig;
my( $main_file, $search, $value )= @ARGV;
# get the info we need by loading the update file
#my $t_upd= new XML::Twig();
#$t_upd->parsefile( $upd_file);
#my $upd_badge_id = $t_upd->root->next_elt( 'badge_id')->text;
#my $upd_chore = $t_upd->root->next_elt( 'jobs');
# Process the main file
my $orig = new XML::Twig( TwigHandlers => { $search => \&search, },
PrettyPrint => 'indented',
);
$orig->parsefile( $main_file );
$orig->flush; # don't forget or the last closing tags won't
+be printed
sub search {
my( $orig, $search )= @_;
print "hrmmm\n";
my $search_tag= $search->tag;
# just replace jobs if the previous badge_id is the right one
if( $search->prev_elt( 'name' )->text eq $search_tag ) {
print "hrmmm\n";
$search->set_text( $value );
}
$orig->flush; # print and flush memory so only one job is in th
+ere at once
}
| [reply] [Watch: Dir/Any] [d/l] |
|
I'll test the above after I post this sample Data. The problem here is this sample data while fine for 'an' example is not indicitive of each XML that may need to be processed. This script is part of a familiy of deployment scripts that deploy applications to various servers. Each deployment might reference a different set of servers and the XML tree may be completely different.
This had previously been done via a name=value pairs in a user supplied configuration file and a script would go through a flat(property)file, find the name and substitue the supplied value. Then someone decided it would be better if the property files that were being updated would be in XML.
Here a sample file:
<ImageQuery547>
<root>
<TraceNumQuery>
<Path>http://localhost:8080/image/ImageVendorServlet?</Pat
+h>
<TraceNum>trace</TraceNum>
<Date>dt</Date>
<Face>fb
</Face></TraceNumQuery>
<CheckNumQuery>
<Path>http://666.666.210.72/wetest/we.dll?</Path>
<Account>acct</Account>
<Amount>amt</Amount>
<CheckNum>sn</CheckNum>
<Date>dt</Date>
<Face>fb</Face>
<Ping>ping</Ping>
</CheckNumQuery></root>
</ImageQuery547>
Typically the values of the Paths (Servers) would be changed on each deployement. This creates the problem of identifying a <TraceNumQuery><Path> from a <CheckNumQuery><Path> and inserting the correct values.
In this case what I would like to is pick up (from a flat file config file ) $filename $element (this could be in the form of "TraceNumQuery::Path" and $value ("http://foo.bar:8080/baz"), and substitue it for the value currently at TraceNumQuery::Path" (http://localhost:8080/image/ImageVendorServlet?)
coreolyn (Should've supplied this right away doh!) | [reply] [Watch: Dir/Any] [d/l] [select] |
|
So here is how I would do it: you only need to update this one tag (possibly many times in the file), so I would use twig_roots and twig_print_outside_roots here: you just go through the file, outputting it as-is unless you find the path, in which case you use a handler to change the content of the element. This will give you the minimum memory footprint (and it's pretty simple too!).
Call this as update file.xml TraceNumQuery/Path http://foo.bar:8080/baz
#!/usr/bin/perl -w
use strict;
use XML::Twig;
my $USAGE= "$0 <file> <path_to_update> <value>";
die $USAGE unless( @ARGV == 3);
my( $file, $path, $value)= @ARGV;
# $_ is set to the current element in the handler
# you could also delete the element after printing it
# for even less memory usage
my $twig= XML::Twig->new( twig_roots => { $path => sub { $_->set_text(
+ $value); $_->print; } },
twig_print_outside_roots => 1,
);
$twig->parsefile( $file);
| [reply] [Watch: Dir/Any] [d/l] |
|
Re: XML Search and Replace
by Desdinova (Friar) on Jun 11, 2002 at 17:26 UTC
|
Everything I have done with XML has been with XML::Twig Of course there has been very little.
Recently I faced a problem kind of like this. I had a XML file that had fields that need to be updated each day the way I did it was to create a handler for element that needed to be updated
my $twig = XML::Twig->new(twig_handlers =>
{MONTH => \&upd_month,
DAY => \&upd_day,
YEAR => \&upd_year
},
PrettyPrint =>'indented',
);
$twig->parsefile(XML_FILE);
Then in the handlers i used the set_text method to replace the value to what I needed
sub upd_month{
my( $t, $post)= @_;
$post->set_text("$expire{month}");
}
sub upd_day{
my( $t, $post)= @_;
$post->set_text("$expire{day}");
}
sub upd_year{
my( $t, $post)= @_;
$post->set_text("$expire{year}");
}
I dont if this would work well for your situation but it fit the niche I had to fill so I thought I would pass it along.
Desdinova
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: XML Search and Replace
by dmitri (Priest) on Jun 12, 2002 at 21:14 UTC
|
Should not that be "Brethren and Sisters?"
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|