in reply to Re^2: Search and Replace in XML
in thread Search and Replace in XML

Set up a Twig handler to find all the REF elements. Parse the XML file. In the handler routine, get the text of the EXT attribute. Set the text of the REF element to be the text of the EXT attribute. Print out the modified XML.

After you have written some code, if you are still having trouble, post your code along with the expected output.

Replies are listed 'Best First'.
Re^4: Search and Replace in XML
by Anonymous Monk on Sep 15, 2009 at 11:07 UTC

    TX for your reply. I have implemented some codes but I have problem for replace the REF element with EXT element. This is the original file

    <DOC id="AFP_ENG_20050316.0102" type="story"> <HEADLINE> Bobby Fischer can escape US if Iceland makes him citizen: Japanese lawmaker by Hiroshi Hiyama ATTENTION - ADDS quotes from immigration official, details /// </HEADLINE> <DATELINE> TOKYO, March 16 </DATELINE> <TEXT> <S Entail="28" s_id="0"> <REF YPE="PROPNAME">Bobby Fischer</REF>can escape US if Iceland makes +<REF ANT-ID="100" EXT="Bobby Fischer" ID="101">him</REF> citizen: Japanese lawmaker by Hiroshi Hiyama ATTENTION - ADDS quotes from immigration official, details /// </S><S Entail="28-31" s_id="1"> Chess legend <REF ID="102" YPE="PROPNAME">Bobby Fischer</REF>, who fa +ces prison if <REF ANT-ID="102" EXT="Bobby Fischer" ID="103" YPE="PRO +N">he</REF> returns to the United States, can only avoid deportation +from Japan if <REF ID="104" YPE="PROPNAME">Iceland</REF> upgrades <REF ANT-ID="104" + EXT="Iceland's" ID="105">its</REF> granting of residency to full cit +izenship, a Japanese lawmaker said Wednesday. </S> </TEXT> </DOC>

    and my expected file is mentioned as below .some of the REF tags do not have EXT ,for that cases there is not replacment

    <DOC id="AFP_ENG_20050316.0102" type="story"> <HEADLINE> Bobby Fischer can escape US if Iceland makes him citizen: Japanese lawmaker by Hiroshi Hiyama ATTENTION - ADDS quotes from immigration official, details /// </HEADLINE> <DATELINE> TOKYO, March 16 </DATELINE> <TEXT> <S Entail="28" s_id="0"> <REF ID="100" YPE="PROPNAME">Bobby Fischer</REF>can escape US if Icela +nd makes <REF ANT-ID="100" EXT="Bobby Fischer" ID="101">Bobby Fische +r</REF> citizen: Japanese lawmaker by Hiroshi Hiyama ATTENTION - ADDS quotes from immigration official, details /// </S><S Entail="28-31" s_id="1"> Chess legend <REF ID="102" YPE="PROPNAME">Bobby Fischer</REF>, who fac +es prison if <REF ANT-ID="102" EXT="Bobby Fischer" ID="103" YPE="PRO +N">Bobby Fischer</REF> returns to the United States, can only avoid d +eportation from Japan if <REF ID="104" >Iceland</REF> upgrades <REF ANT-ID="104" EXT="Iceland' +s" ID="105">Iceland's</REF> granting of residency to full citizenship +, a Japanese lawmaker said Wednesday. </S> </TEXT> </DOC>

    my code is

    #!/bin/perl -w use strict; use XML::Twig; my $twig=new XML::Twig(twig_roots => {'TEXT' =>1} ,twig_handlers=>{'R +EF' => \&REF}); # change the address of root to TEXT element because +REF is the children of TEXT my $field='REF'; $twig->parsefile("AFP_ENG_20050316.0102.xml"); #Parse the file my $root=$twig->root; my $rootchild=$root->children; sub REF { my ($twig ,$field)=@_; my $extension = $field->text;#keep the value of REF element my $att=$field->att('EXT'); #keep the value of Ext ( some REF elemen +ts do not have EXT attribute) print "$extension\n"; $att->set_text($field); $extension=$att; } $twig->print;

    I would like to know is there function in twig to remove the REF elements after this replacement but keep the value of them in sentence? thanks a lot.

      I think this gets you a little closer:
      use warnings; use strict; use XML::Twig; my $twig = new XML::Twig( twig_handlers => { 'REF' => \&REF } ); $twig->parsefile("AFP_ENG_20050316.0102.xml"); #Parse the file $twig->print(); sub REF { my ( $twig, $field ) = @_; my $ref_text = $field->text(); my $ref_ext = $field->att('EXT'); $field->set_text($ref_ext) if $ref_ext; }

      You can try to use the cut method to remove elements.

        Thanks a lot dude. It is perfect and you are more :)