in reply to Re: XML::Twig usage incomprehension
in thread XML::Twig usage incomprehension

Hello Mirod,
Let me be more accurate then ;)

Here's a shortened data sample :

<?xml version="1.0" encoding="UTF-8"?> <DSExport> <Header CharacterSet="ENGLISH" ExportingTool="Ascential DataStage E +xport" ToolVersion="4" ServerName="VILLETTE" ToolInstanceID="Portage" + Date="2006-03-29" Time="10.00.34" ServerVersion="7.5.1.A"/> <Job Identifier="adresses" DateModified="2006-03-28" TimeModified=" +17.15.15"> <Record Identifier="ROOT" DateModified="1899-12-30" TimeModified +="00.00.01" Type="JobDefn" Readonly="0"> <Property Name="Name">adresses</Property> <Property Name="Description">DFE/ job of addresses MAJ - 18/1 +1/2004</Property> <Property Name="ControlAfterSubr">1</Property> </Record> <Record Identifier="V0S0" DateModified="1899-12-30" TimeModified +="00.00.01" Type="CustomStage" Readonly="0"> <Property Name="AllowColumnMapping">0</Property> <Property Name="NextRecordID">0</Property> </Record> <Record Identifier="V0S0P1" DateModified="1899-12-30" TimeModifi +ed="00.00.01" Type="CustomOutput" Readonly="0"> <Property Name="Name">li_hf_adresses</Property> <Property Name="Partner">V0S1</Property> <Collection Name="Properties" Type="CustomProperty"> <SubRecord> <Property Name="Name">TABLE</Property> <Property Name="Value">APPS.RA_ADDRESSES_ALL</Property> </SubRecord> <SubRecord> <Property Name="Name">USERSQL</Property> <Property Name="Value" PreFormatted="1"> SELECT ADDRESS_ID,LAST_UPDATE_DATE,LAST_UPDATED_BY,CREA +TION_DATE,CREATED_BY,COUNTRY,ADDRESS1,ADDRESS2,ADDRESS3,ADDRESS4,CITY +,POSTAL_CODE,STATE,PROVINCE,COUNTY FROM APPS.RA_ADDRESSES_ALL WHERE L +AST_UPDATE_DATE&gt;=(SELECT LAST_LOADING_DATE from DWADMIN.INT_LOADIN +GS@#DB_LINK# WHERE TABLE_NAME = &apos;CUS_ADDRESSES&apos; AND SYSTEM_ORIGIN = &apos;#system_origin#&apos; ) </Property> </SubRecord> <SubRecord> <Property Name="Name">ARRAYSIZE</Property> <Property Name="Value">32000</Property> </SubRecord> </Collection> <Property Name="TopTextPos">478</Property> </Record> <Record Identifier="V0S19P4" DateModified="1899-12-30" TimeModif +ied="00.00.01" Type="CustomInput" Readonly="0"> <Property Name="Name">li_addresses_ins</Property> <Property Name="Partner">V0S17</Property> <Collection Name="Properties" Type="CustomProperty"> <SubRecord> <Property Name="Name">TABLE</Property> <Property Name="Value">DWADMIN.CUS_ADDRESSES</Property> </SubRecord> <SubRecord> <Property Name="Name">GENSQL</Property> <Property Name="Value">No</Property> </SubRecord> <SubRecord> <Property Name="Name">USERSQL</Property> <Property Name="Value" PreFormatted="1"> INSERT INTO DWADMIN.CUS_ADDRESSES ( ADDRESS_ID, ORIGINAL_ADDRESS_ID, ADDRESS_1,ADDRESS_2, A +DDRESS_3, ADDRESS_4,CITY,ZIP_CODE,COUNTRY,ADD_INFORMATION_1,ADD_INFOR +MATION_2,ADD_INFORMATION_3,EMAIL,CREATED_BY,CREATION_DATE,LAST_UPDATE +D_BY,LAST_UPDATE_DATE,SYSTEM_ORIGIN,SALES_TERRITORY_COUNTRY,SALES_TER +RITORY_ADMIN_REGION,SALES_TERRITORY_SECTOR,SALESREP_ID) VALUES (DWADM +IN.CUS_ADDRESS_ID_SEQ.NEXTVAL,:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12, +:13,TO_DATE(:14, &apos;YYYY-MM-DD HH24:MI:SS&apos;),:15,TO_DATE(:16, +&apos;YYYY-MM-DD HH24:MI:SS&apos;),:17,:18,:19,:20,:21) </Property> </SubRecord> </Collection> </Record> </Job> </DSExport>
The result I would like to get is :
<Job Identifier="adresses"> <tables>APPS.RA_ADDRESSES_ALL</tables> <usersql>SELECT ADDRESS_ID,LAST_UPDATE_DATE,LAST_UPDATED_BY,CREA +TION_DATE,CREATED_BY,COUNTRY,ADDRESS1,ADDRESS2,ADDRESS3,ADDRESS4,CITY +,POSTAL_CODE,STATE,PROVINCE,COUNTY FROM APPS.RA_ADDRESSES_ALL WHERE L +AST_UPDATE_DATE&gt;=(SELECT LAST_LOADING_DATE from DWADMIN.INT_LOADIN +GS@#DB_LINK# WHERE TABLE_NAME = &apos;CUS_ADDRESSES&apos; AND SYSTEM_ORIGIN = &apos;#system_origin#&apos; ) </usersql> </Job>
The table field should be the one line 20 and usersql line 25.

my not code working is :

use strict; use XML::Twig; my $t= new XML::Twig(); my $root = $t->root; $t->parsefile( 'T:\\BI\\Jerome\\xml\\tree.xml'); exit; sub job { my ($t, $root)= @_; my %job; $job{Attr_name}= $root->{'att'}->{'Name'}; $job{Attr_value}= $root->{'att'}->{'Value'}; print "$job{Attr_name}: $job{Attr_value}\n"; $root->delete; }
Note : in that example I just wanted to list all the Name and Value fields but I get no output.....

Hope my new explanations will be better for you.

Thanks in advance.

Jerome

Replies are listed 'Best First'.
Re^3: XML::Twig usage incomprehension
by mirod (Canon) on Apr 03, 2006 at 12:40 UTC

    Sorry, still not enough details. Why do you only include 1 record in the output, when there are 2 properties "TABLE" in the input? Also the code you wrote does not output XML, but a list of property name/values, so your expected result should be at least consistent with that.

    Plus (going back to rant mode) why did you truncate the usersql content? I would appreciate not having to track it down in the input to cut and paste it.

    That said, the input XML seems incredibly clumsy. Faced with something like this, my first instinct would be to turn all those Subrecords that include a single property defined by 2 Property elements into a single element, named after the property name, with the value of the property as content. Once that's done it should be much easier to work with the data.

    The code below does just that.

    #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $t= XML::Twig->new( twig_roots => { 'Collection[@Type="CustomProper +ty"]' => \&collection}, twig_print_outside_roots => 1, ); $t->parsefile( $ARG[0]); exit; sub collection { my ($t, $collection)= @_; foreach my $sub_record ($collection->children( 'SubRecord')) { my $name = $sub_record->field( 'Property[@Name="Name"]'); my $value = $sub_record->field( 'Property[@Name="Value"]'); # we want to keep thæ attributes (except Name of course) my $atts = $sub_record->first_child( 'Property[@Name="Value"] +')->atts; delete $atts->{Name}; my $property= XML::Twig::Elt->new( $name, $atts, $value); $property->replace( $sub_record); } $collection->print; }
      Thanks for your answer Mirod!

      Lets answer to all your questions and comment :

      I only included 1 record in the output just to show an example of what informations I'm interested in. As my XML file does 18 megs, I've truncated it to.... Zipped XML does 1Meg so if it can help you, I can mail it to you...

      I've put the full USERSQL content back for you ;)

      I fully agree with you when you say the file is clumsy but it's the one I have to work on :(

      The output I would like to get after parsing the demo XML tree should be exactly like this in the best case :

      <Job Identifier="adresses"> <TABLE>APPS.RA_ADDRESSES_ALL</TABLE> <USERSQL>SELECT ADDRESS_ID,LAST_UPDATE_DATE,LAST_UPDATE +D_BY,CREATION_DATE,CREATED_BY,COUNTRY,ADDRESS1,ADDRESS2,ADDRESS3,ADDR +ESS4,CITY,POSTAL_CODE,STATE,PROVINCE,COUNTY FROM APPS.RA_ADDRESSES_AL +L WHERE LAST_UPDATE_DATE&gt;=(SELECT LAST_LOADING_DATE from DWADMIN.I +NT_LOADINGS@#DB_LINK# WHERE TABLE_NAME = &apos;CUS_ADDRESSES&apos; AND SYSTEM_ORIGIN = &apos;#system_origin#&apos; ) </USERSQL> <TABLE>DWADMIN.CUS_ADDRESSES</TABLE> <USERSQL>INSERT INTO DWADMIN.CUS_ADDRESSES ( ADDRESS_ID, ORIGINAL_ADDRESS_ID, ADDRESS_1,ADDRESS_2, A +DDRESS_3, ADDRESS_4,CITY,ZIP_CODE, COUNTRY,ADD_INFORMATION_1,ADD_INFORMATION_2,ADD_INFORMA +TION_3,EMAIL,CREATED_BY,CREATION_DATE, LAST_UPDATED_BY,LAST_UPDATE_DATE,SYSTEM_ORIGIN,SALES_TE +RRITORY_COUNTRY, SALES_TERRITORY_ADMIN_REGION,SALES_TERRITORY_SECTOR,SAL +ESREP_ID) VALUES (DWADMIN.CUS_ADDRESS_ID_SEQ.NEXTVAL,:1,:2,:3,:4, +:5,:6,:7,:8,:9,:10,:11,:12,:13, TO_DATE(:14, &apos;YYYY-MM-DD HH24:MI:SS&apos;),: 15,TO_DATE(:16, &apos;YYYY-MM-DD HH24:MI:SS&apos;),:17, +:18,:19,:20,:21) </USERSQL> </Job>
      As you can see I'm totally new to Twig and can't understang how that thing works exactly...Thanks for you precious help.

        OK, the code below should do the trick. 2 comments: if you have control over the output format, I would have the TABLE and USERSQL elements merged into 1 element, either a simple one with the table as an attribute, or just an englobing one with 2 sub-elements. These 2 elements are linked, the format should show this. And I had to guess what to do with the GENSQL property that appeared before the USERSQL one for the second table.

        #!/usr/bin/perl use strict; use warnings; use XML::Twig::XPath; my $t= XML::Twig::XPath->new( twig_roots => { Job => \&job}, pretty_pr +int => 'indented'); $t->parsefiel( $ARGV[0]); exit; sub job { my ($t, $job)= @_; my @sub_record_table= $job->findnodes( './/SubRecord[./Property[@N +ame="Name" and text()="TABLE"]]'); return unless( @sub_record_table); my $out_job= XML::Twig::Elt->new( job => { identifier => $job->att +( 'Identifier') }); foreach my $sub_record ( @sub_record_table) { my $table = $sub_record->field( 'Property[@Name="Value"]'); $out_job->insert_new_elt( last_child => TABLE => $table); my $found_sql=0; while( !$found_sql) { $sub_record= $sub_record->next_sibling( 'SubRecord') or la +st; my $tag= $sub_record->field( 'Property[@Name="Name"]'); my $content= $sub_record->field( 'Property[@Name="Value"]' +); if( $content ne 'No') { $found_sql=1; $out_job->insert_new_elt( last_child => $tag => $conte +nt); } } } $out_job->print(); $t->purge; # if you need to free the memory }