in reply to XML::Twig usage incomprehension

I apologize, but instead of an answer, I will offer you a semi-rant on how to ask a question so it's easy for me (and any one else!) to answer. Don't take it baddly, I just hope this will help you get an answer for this question... and the next ones.

For me to answer, your question should be as close to a test as possible: an example of your (non-working) code, with data (I like it in a DATA section, so I can just cut'n paste the whole thing and have it all in a single file), and the results you expect. If you write it as a real test, using Test or Test::More, that's even better.

As it is your XML is really long and not well-formed, and I quickly gave up on trying to get the closing tags right. Plus it is a bit difficult for me to figure out exactly what result you want.

If you post something that I can run, fix and post back, you have a much better chance to get a usefull answer.

Replies are listed 'Best First'.
Re^2: XML::Twig usage incomprehension
by Jerome (Initiate) on Apr 03, 2006 at 09:14 UTC
    Hello Mirod,
    Let me be more accurate then ;)

    Here's a shortened data sample :

    <?xml version="1.0" encoding="UTF-8"?> <DSExport> <Header CharacterSet="ENGLISH" ExportingTool="Ascential DataStage E +xport" ToolVersion="4" ServerName="VILLETTE" ToolInstanceID="Portage" + Date="2006-03-29" Time="10.00.34" ServerVersion="7.5.1.A"/> <Job Identifier="adresses" DateModified="2006-03-28" TimeModified=" +17.15.15"> <Record Identifier="ROOT" DateModified="1899-12-30" TimeModified +="00.00.01" Type="JobDefn" Readonly="0"> <Property Name="Name">adresses</Property> <Property Name="Description">DFE/ job of addresses MAJ - 18/1 +1/2004</Property> <Property Name="ControlAfterSubr">1</Property> </Record> <Record Identifier="V0S0" DateModified="1899-12-30" TimeModified +="00.00.01" Type="CustomStage" Readonly="0"> <Property Name="AllowColumnMapping">0</Property> <Property Name="NextRecordID">0</Property> </Record> <Record Identifier="V0S0P1" DateModified="1899-12-30" TimeModifi +ed="00.00.01" Type="CustomOutput" Readonly="0"> <Property Name="Name">li_hf_adresses</Property> <Property Name="Partner">V0S1</Property> <Collection Name="Properties" Type="CustomProperty"> <SubRecord> <Property Name="Name">TABLE</Property> <Property Name="Value">APPS.RA_ADDRESSES_ALL</Property> </SubRecord> <SubRecord> <Property Name="Name">USERSQL</Property> <Property Name="Value" PreFormatted="1"> SELECT ADDRESS_ID,LAST_UPDATE_DATE,LAST_UPDATED_BY,CREA +TION_DATE,CREATED_BY,COUNTRY,ADDRESS1,ADDRESS2,ADDRESS3,ADDRESS4,CITY +,POSTAL_CODE,STATE,PROVINCE,COUNTY FROM APPS.RA_ADDRESSES_ALL WHERE L +AST_UPDATE_DATE&gt;=(SELECT LAST_LOADING_DATE from DWADMIN.INT_LOADIN +GS@#DB_LINK# WHERE TABLE_NAME = &apos;CUS_ADDRESSES&apos; AND SYSTEM_ORIGIN = &apos;#system_origin#&apos; ) </Property> </SubRecord> <SubRecord> <Property Name="Name">ARRAYSIZE</Property> <Property Name="Value">32000</Property> </SubRecord> </Collection> <Property Name="TopTextPos">478</Property> </Record> <Record Identifier="V0S19P4" DateModified="1899-12-30" TimeModif +ied="00.00.01" Type="CustomInput" Readonly="0"> <Property Name="Name">li_addresses_ins</Property> <Property Name="Partner">V0S17</Property> <Collection Name="Properties" Type="CustomProperty"> <SubRecord> <Property Name="Name">TABLE</Property> <Property Name="Value">DWADMIN.CUS_ADDRESSES</Property> </SubRecord> <SubRecord> <Property Name="Name">GENSQL</Property> <Property Name="Value">No</Property> </SubRecord> <SubRecord> <Property Name="Name">USERSQL</Property> <Property Name="Value" PreFormatted="1"> INSERT INTO DWADMIN.CUS_ADDRESSES ( ADDRESS_ID, ORIGINAL_ADDRESS_ID, ADDRESS_1,ADDRESS_2, A +DDRESS_3, ADDRESS_4,CITY,ZIP_CODE,COUNTRY,ADD_INFORMATION_1,ADD_INFOR +MATION_2,ADD_INFORMATION_3,EMAIL,CREATED_BY,CREATION_DATE,LAST_UPDATE +D_BY,LAST_UPDATE_DATE,SYSTEM_ORIGIN,SALES_TERRITORY_COUNTRY,SALES_TER +RITORY_ADMIN_REGION,SALES_TERRITORY_SECTOR,SALESREP_ID) VALUES (DWADM +IN.CUS_ADDRESS_ID_SEQ.NEXTVAL,:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12, +:13,TO_DATE(:14, &apos;YYYY-MM-DD HH24:MI:SS&apos;),:15,TO_DATE(:16, +&apos;YYYY-MM-DD HH24:MI:SS&apos;),:17,:18,:19,:20,:21) </Property> </SubRecord> </Collection> </Record> </Job> </DSExport>
    The result I would like to get is :
    <Job Identifier="adresses"> <tables>APPS.RA_ADDRESSES_ALL</tables> <usersql>SELECT ADDRESS_ID,LAST_UPDATE_DATE,LAST_UPDATED_BY,CREA +TION_DATE,CREATED_BY,COUNTRY,ADDRESS1,ADDRESS2,ADDRESS3,ADDRESS4,CITY +,POSTAL_CODE,STATE,PROVINCE,COUNTY FROM APPS.RA_ADDRESSES_ALL WHERE L +AST_UPDATE_DATE&gt;=(SELECT LAST_LOADING_DATE from DWADMIN.INT_LOADIN +GS@#DB_LINK# WHERE TABLE_NAME = &apos;CUS_ADDRESSES&apos; AND SYSTEM_ORIGIN = &apos;#system_origin#&apos; ) </usersql> </Job>
    The table field should be the one line 20 and usersql line 25.

    my not code working is :

    use strict; use XML::Twig; my $t= new XML::Twig(); my $root = $t->root; $t->parsefile( 'T:\\BI\\Jerome\\xml\\tree.xml'); exit; sub job { my ($t, $root)= @_; my %job; $job{Attr_name}= $root->{'att'}->{'Name'}; $job{Attr_value}= $root->{'att'}->{'Value'}; print "$job{Attr_name}: $job{Attr_value}\n"; $root->delete; }
    Note : in that example I just wanted to list all the Name and Value fields but I get no output.....

    Hope my new explanations will be better for you.

    Thanks in advance.

    Jerome

      Sorry, still not enough details. Why do you only include 1 record in the output, when there are 2 properties "TABLE" in the input? Also the code you wrote does not output XML, but a list of property name/values, so your expected result should be at least consistent with that.

      Plus (going back to rant mode) why did you truncate the usersql content? I would appreciate not having to track it down in the input to cut and paste it.

      That said, the input XML seems incredibly clumsy. Faced with something like this, my first instinct would be to turn all those Subrecords that include a single property defined by 2 Property elements into a single element, named after the property name, with the value of the property as content. Once that's done it should be much easier to work with the data.

      The code below does just that.

      #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $t= XML::Twig->new( twig_roots => { 'Collection[@Type="CustomProper +ty"]' => \&collection}, twig_print_outside_roots => 1, ); $t->parsefile( $ARG[0]); exit; sub collection { my ($t, $collection)= @_; foreach my $sub_record ($collection->children( 'SubRecord')) { my $name = $sub_record->field( 'Property[@Name="Name"]'); my $value = $sub_record->field( 'Property[@Name="Value"]'); # we want to keep thæ attributes (except Name of course) my $atts = $sub_record->first_child( 'Property[@Name="Value"] +')->atts; delete $atts->{Name}; my $property= XML::Twig::Elt->new( $name, $atts, $value); $property->replace( $sub_record); } $collection->print; }
        Thanks for your answer Mirod!

        Lets answer to all your questions and comment :

        I only included 1 record in the output just to show an example of what informations I'm interested in. As my XML file does 18 megs, I've truncated it to.... Zipped XML does 1Meg so if it can help you, I can mail it to you...

        I've put the full USERSQL content back for you ;)

        I fully agree with you when you say the file is clumsy but it's the one I have to work on :(

        The output I would like to get after parsing the demo XML tree should be exactly like this in the best case :

        <Job Identifier="adresses"> <TABLE>APPS.RA_ADDRESSES_ALL</TABLE> <USERSQL>SELECT ADDRESS_ID,LAST_UPDATE_DATE,LAST_UPDATE +D_BY,CREATION_DATE,CREATED_BY,COUNTRY,ADDRESS1,ADDRESS2,ADDRESS3,ADDR +ESS4,CITY,POSTAL_CODE,STATE,PROVINCE,COUNTY FROM APPS.RA_ADDRESSES_AL +L WHERE LAST_UPDATE_DATE&gt;=(SELECT LAST_LOADING_DATE from DWADMIN.I +NT_LOADINGS@#DB_LINK# WHERE TABLE_NAME = &apos;CUS_ADDRESSES&apos; AND SYSTEM_ORIGIN = &apos;#system_origin#&apos; ) </USERSQL> <TABLE>DWADMIN.CUS_ADDRESSES</TABLE> <USERSQL>INSERT INTO DWADMIN.CUS_ADDRESSES ( ADDRESS_ID, ORIGINAL_ADDRESS_ID, ADDRESS_1,ADDRESS_2, A +DDRESS_3, ADDRESS_4,CITY,ZIP_CODE, COUNTRY,ADD_INFORMATION_1,ADD_INFORMATION_2,ADD_INFORMA +TION_3,EMAIL,CREATED_BY,CREATION_DATE, LAST_UPDATED_BY,LAST_UPDATE_DATE,SYSTEM_ORIGIN,SALES_TE +RRITORY_COUNTRY, SALES_TERRITORY_ADMIN_REGION,SALES_TERRITORY_SECTOR,SAL +ESREP_ID) VALUES (DWADMIN.CUS_ADDRESS_ID_SEQ.NEXTVAL,:1,:2,:3,:4, +:5,:6,:7,:8,:9,:10,:11,:12,:13, TO_DATE(:14, &apos;YYYY-MM-DD HH24:MI:SS&apos;),: 15,TO_DATE(:16, &apos;YYYY-MM-DD HH24:MI:SS&apos;),:17, +:18,:19,:20,:21) </USERSQL> </Job>
        As you can see I'm totally new to Twig and can't understang how that thing works exactly...Thanks for you precious help.