Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

XML Search and Replace

by coreolyn (Parson)
on Jun 11, 2002 at 15:01 UTC ( [id://173505]=perlquestion: print w/replies, xml ) Need Help??

coreolyn has asked for the wisdom of the Perl Monks concerning the following question:

Venerable Bretheren ( & Bretherenesses ),

I've been able to avoid DOM, SAX and XML in general and it has now caught up to me. After reading the available information it looks like XML::Twig is the right tool for what I'm attempting to do, but that was after some quick reading. (Unkown file lengths make potential memory problems for XML::Simple and XML::Twig just looks cleaner than XML::Parser.)

I need to create a script that can dynamically change the value of elements in a well formed XML file. A calling program will supply the name of the element to search for and the value to supply to it.

I had thought this should be a piece of cake, but I'm paying a bit of a price for avoiding XML. I can do this via a regex, but as this is a script that is liable to grow in it's functionality I figure I need to stop and get a clue how to do this right.

I've stolen this snippet from a node of mirods and it pretty much exposes my ignorance. mirod's original node is a replace of XML via an XML file, I've started to re-arange variable names and eliminated the updated XML file as I'm attempting to bring in the vars from an external program:

#!/bin/perl -w use strict; use XML::Twig; my( $main_file, $search, $value )= @ARGV; # get the info we need by loading the update file #my $t_upd= new XML::Twig(); #$t_upd->parsefile( $upd_file); #my $upd_badge_id = $t_upd->root->next_elt( 'badge_id')->text; #my $upd_chore = $t_upd->root->next_elt( 'jobs'); # Process the main file my $orig = new XML::Twig( TwigHandlers => { $search => \&search, }, PrettyPrint => 'indented', ); $orig->parsefile( $main_file ); $orig->flush; # don't forget or the last closing tags won't +be printed sub search { my( $orig, $search )= @_; print "hrmmm\n"; # just replace jobs if the previous badge_id is the right one if( $search->prev_elt( 'name' )->text eq $search ) { print "hrmmm\n"; $orig->replace( $value ); } $orig->flush; # print and flush memory so only one job is in th +ere at once }
coreolyn - exposure of ignorace is so eeew ya know?

Replies are listed 'Best First'.
(jeffa) Re: XML Search and Replace
by jeffa (Bishop) on Jun 11, 2002 at 16:00 UTC
    I will let mirod handle the XML::Twig version, as i have not progressed to that module yet. I recently bought Perl & XML and am enjoying it immensely. Here is an 'event stream' version that uses XML::Parser and XML::Writer to replace all <foo> elements with the element <struggle>. Input is from the DATA file handle and output is STDOUT:
    use strict; use XML::Parser; use XML::Writer; my $writer = XML::Writer->new(); my $parser = XML::Parser->new( Handlers => { Init => \&handle_Init, Start => \&handle_Start, Char => \&handle_Char, End => \&handle_End, Final => \&handle_Final, } ); # i could have also made these $parser attributes # such as $parser->{from} and $parser->{to} our $from = 'foo'; our $to = 'struggle'; my $data = do {local $/;<DATA>}; $parser->parse($data); # called once at the beginning sub handle_Init { $writer->xmlDecl('UTF-8'); $writer->doctype('xml'); } # called each time a start element is encountered sub handle_Start { my($self,$name,%atts) = @_; $name = $to if $name eq $from; $writer->startTag($name,%atts); } # called each time non-markup data is encountered sub handle_Char { my($self,$text) = @_; $writer->characters($text); } # called each time an end element is encountered sub handle_End { my($self,$name) = @_; $name = $to if $name eq $from; $writer->endTag($name); } # called once at the end of the document sub handle_Final { $writer->end(); } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xml> <xml> <foo class="life or death"> <opponent>wolf</opponent> <opponent>ant</opponent> </foo> <foo class="life or death"> <opponent>pantomime goose</opponent> <opponent>Terrance Rattigan</opponent> </foo> </xml>

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      This code helps a lot for understanding XML::Parser, but it also shows how I've failed to communicate what I'm attempting to do.

      To illustrate via your example: I'm looking to search for the 'opponent' element and change the value of it's text.

      coreolyn .. me thinks I'll be buying Perl & XML shortly.
        Here is another way, using XML::Simple, which changes a few of the opponents, like you wanted.
        use strict; use XML::Simple; my @data = (<DATA>); my $xml = XMLin((join'', @data)); foreach my $foo (@{$xml->{'foo'}}) { foreach my $opponent (@{$foo->{'opponent'}}) { if($opponent eq 'wolf') { $opponent = 'Heinz Sielmann'; } elsif($opponent eq 'ant') { $opponent = 'Peter Scott'; } } } print XMLout($xml, rootname => 'xml'); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xml> <xml> <foo class="life or death"> <opponent>wolf</opponent> <opponent>ant</opponent> </foo> <foo class="life or death"> <opponent>pantomime goose</opponent> <opponent>Terrance Rattigan</opponent> </foo> </xml>
        I don't usually code a lot of XML, but when I need to, I find that XML::Simple (together with Data::Dumper) often lets me do simple stuff really quickly. Of course, it takes a little tounge-in-cheek for the dereferencing sometimes, see references quick reference for an excellent tutorial on this. :)
        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a wolf.
        Had to run to lunch ... here is another version that DWYW ;)
        use strict; use XML::Parser; use XML::Writer; my $writer = XML::Writer->new(); my $parser = XML::Parser->new( Handlers => { Init => \&handle_Init, Start => \&handle_Start, Char => \&handle_Char, End => \&handle_End, Final => \&handle_Final, } ); my $data = do {local $/;<DATA>}; $parser->{match} = 'opponent'; $parser->parse($data); sub handle_Init { $writer->xmlDecl('UTF-8'); $writer->doctype('xml'); } sub handle_Start { my($self,$name,%atts) = @_; $self->{flag} = 1 if $name eq $self->{match}; $writer->startTag($name,%atts); } sub handle_Char { my($self,$text) = @_; if ($self->{flag}) { if ($text eq 'Terrance Rattigan') { $text = 'breakfast'; } else { $text =~ s/goose/Queen Elizabeth/; } delete $self->{flag}; } $writer->characters($text); } sub handle_End { my($self,$name) = @_; $writer->endTag($name); } sub handle_Final { $writer->end(); } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xml> <xml> <struggle class="life or death"> <opponent>wolf</opponent> <opponent>ant</opponent> </struggle> <struggle class="life or death"> <opponent>pantomime goose</opponent> <opponent>Terrance Rattigan</opponent> </struggle> </xml>
        If a start element named 'opponent' is found, we set a flag - why not use the parser's namespace? ;) Next, each time a non-markup character is encountered, we see if the flag is set and if it is, do some conversions and erase the flag.

        Most of my XML munging (until recently) has been with XML::Simple. That module builds an internal tree that represents the document. As Dog and Pony showed you, it is a really easy module to work with, but as the document you are munging gets larger, XML::Simple gets slower and takes up more memory.

        These two versions i supplied use XML::Parser to take advantage of 'event streams', they are more economical in speed and memory. But they are also more complicated, as you can immediately tell by comparing my code with Dog and Pony's.

        jeffa

        "Here we see a life and death stuggle between jeffa and Dog and Pony ..." ;)
Re: XML Search and Replace
by mirod (Canon) on Jun 11, 2002 at 16:42 UTC

    I don't really understand what you are trying to do, but I think you have a problem with how you use the arguments passed to the handler (search). The handler receives 2 arguments: the twig ($orig in this case) and the current element ($search). So really you can't write $search-$gt;prev_elt( 'name' )->text eq $search, search is an XML::Twig::Elt object, not a string. Then replace is a method on an element, not on a twig, so you probably don't want to write $orig->replace( $value );.

    Would this work? (I can't test it without the actual XML data) It should work provided the text of the name element before the $search element is $search, which seems a bit odd to me.

    #!/bin/perl -w use strict; use XML::Twig; my( $main_file, $search, $value )= @ARGV; # get the info we need by loading the update file #my $t_upd= new XML::Twig(); #$t_upd->parsefile( $upd_file); #my $upd_badge_id = $t_upd->root->next_elt( 'badge_id')->text; #my $upd_chore = $t_upd->root->next_elt( 'jobs'); # Process the main file my $orig = new XML::Twig( TwigHandlers => { $search => \&search, }, PrettyPrint => 'indented', ); $orig->parsefile( $main_file ); $orig->flush; # don't forget or the last closing tags won't +be printed sub search { my( $orig, $search )= @_; print "hrmmm\n"; my $search_tag= $search->tag; # just replace jobs if the previous badge_id is the right one if( $search->prev_elt( 'name' )->text eq $search_tag ) { print "hrmmm\n"; $search->set_text( $value ); } $orig->flush; # print and flush memory so only one job is in th +ere at once }

      I'll test the above after I post this sample Data. The problem here is this sample data while fine for 'an' example is not indicitive of each XML that may need to be processed. This script is part of a familiy of deployment scripts that deploy applications to various servers. Each deployment might reference a different set of servers and the XML tree may be completely different.

      This had previously been done via a name=value pairs in a user supplied configuration file and a script would go through a flat(property)file, find the name and substitue the supplied value. Then someone decided it would be better if the property files that were being updated would be in XML.

      Here a sample file:

      <ImageQuery547> <root> <TraceNumQuery> <Path>http://localhost:8080/image/ImageVendorServlet?</Pat +h> <TraceNum>trace</TraceNum> <Date>dt</Date> <Face>fb </Face></TraceNumQuery> <CheckNumQuery> <Path>http://666.666.210.72/wetest/we.dll?</Path> <Account>acct</Account> <Amount>amt</Amount> <CheckNum>sn</CheckNum> <Date>dt</Date> <Face>fb</Face> <Ping>ping</Ping> </CheckNumQuery></root> </ImageQuery547>

      Typically the values of the Paths (Servers) would be changed on each deployement. This creates the problem of identifying a <TraceNumQuery><Path> from a <CheckNumQuery><Path> and inserting the correct values.

      In this case what I would like to is pick up (from a flat file config file ) $filename $element (this could be in the form of "TraceNumQuery::Path" and $value ("http://foo.bar:8080/baz"), and substitue it for the value currently at TraceNumQuery::Path" (http://localhost:8080/image/ImageVendorServlet?)

      coreolyn (Should've supplied this right away doh!)

        So here is how I would do it: you only need to update this one tag (possibly many times in the file), so I would use twig_roots and twig_print_outside_roots here: you just go through the file, outputting it as-is unless you find the path, in which case you use a handler to change the content of the element. This will give you the minimum memory footprint (and it's pretty simple too!).

        Call this as update file.xml TraceNumQuery/Path http://foo.bar:8080/baz

        #!/usr/bin/perl -w use strict; use XML::Twig; my $USAGE= "$0 <file> <path_to_update> <value>"; die $USAGE unless( @ARGV == 3); my( $file, $path, $value)= @ARGV; # $_ is set to the current element in the handler # you could also delete the element after printing it # for even less memory usage my $twig= XML::Twig->new( twig_roots => { $path => sub { $_->set_text( + $value); $_->print; } }, twig_print_outside_roots => 1, ); $twig->parsefile( $file);
Re: XML Search and Replace
by Desdinova (Friar) on Jun 11, 2002 at 17:26 UTC
    Everything I have done with XML has been with XML::Twig Of course there has been very little.
    Recently I faced a problem kind of like this. I had a XML file that had fields that need to be updated each day the way I did it was to create a handler for element that needed to be updated
    my $twig = XML::Twig->new(twig_handlers => {MONTH => \&upd_month, DAY => \&upd_day, YEAR => \&upd_year }, PrettyPrint =>'indented', ); $twig->parsefile(XML_FILE);

    Then in the handlers i used the set_text method to replace the value to what I needed
    sub upd_month{ my( $t, $post)= @_; $post->set_text("$expire{month}"); } sub upd_day{ my( $t, $post)= @_; $post->set_text("$expire{day}"); } sub upd_year{ my( $t, $post)= @_; $post->set_text("$expire{year}"); }

    I dont if this would work well for your situation but it fit the niche I had to fill so I thought I would pass it along.
    Desdinova
Re: XML Search and Replace
by dmitri (Priest) on Jun 12, 2002 at 21:14 UTC
    Should not that be "Brethren and Sisters?"

      Actually it was an attempt to de-gender the fellowship.. it has been an issue on occasion... ok so my humor is a tad dry :/

      coreolyn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://173505]
Approved by broquaint
Front-paged by ignatz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-29 07:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found