in reply to Re^3: XML::Twig usage incomprehension
in thread XML::Twig usage incomprehension

Thanks for your answer Mirod!

Lets answer to all your questions and comment :

I only included 1 record in the output just to show an example of what informations I'm interested in. As my XML file does 18 megs, I've truncated it to.... Zipped XML does 1Meg so if it can help you, I can mail it to you...

I've put the full USERSQL content back for you ;)

I fully agree with you when you say the file is clumsy but it's the one I have to work on :(

The output I would like to get after parsing the demo XML tree should be exactly like this in the best case :

<Job Identifier="adresses"> <TABLE>APPS.RA_ADDRESSES_ALL</TABLE> <USERSQL>SELECT ADDRESS_ID,LAST_UPDATE_DATE,LAST_UPDATE +D_BY,CREATION_DATE,CREATED_BY,COUNTRY,ADDRESS1,ADDRESS2,ADDRESS3,ADDR +ESS4,CITY,POSTAL_CODE,STATE,PROVINCE,COUNTY FROM APPS.RA_ADDRESSES_AL +L WHERE LAST_UPDATE_DATE&gt;=(SELECT LAST_LOADING_DATE from DWADMIN.I +NT_LOADINGS@#DB_LINK# WHERE TABLE_NAME = &apos;CUS_ADDRESSES&apos; AND SYSTEM_ORIGIN = &apos;#system_origin#&apos; ) </USERSQL> <TABLE>DWADMIN.CUS_ADDRESSES</TABLE> <USERSQL>INSERT INTO DWADMIN.CUS_ADDRESSES ( ADDRESS_ID, ORIGINAL_ADDRESS_ID, ADDRESS_1,ADDRESS_2, A +DDRESS_3, ADDRESS_4,CITY,ZIP_CODE, COUNTRY,ADD_INFORMATION_1,ADD_INFORMATION_2,ADD_INFORMA +TION_3,EMAIL,CREATED_BY,CREATION_DATE, LAST_UPDATED_BY,LAST_UPDATE_DATE,SYSTEM_ORIGIN,SALES_TE +RRITORY_COUNTRY, SALES_TERRITORY_ADMIN_REGION,SALES_TERRITORY_SECTOR,SAL +ESREP_ID) VALUES (DWADMIN.CUS_ADDRESS_ID_SEQ.NEXTVAL,:1,:2,:3,:4, +:5,:6,:7,:8,:9,:10,:11,:12,:13, TO_DATE(:14, &apos;YYYY-MM-DD HH24:MI:SS&apos;),: 15,TO_DATE(:16, &apos;YYYY-MM-DD HH24:MI:SS&apos;),:17, +:18,:19,:20,:21) </USERSQL> </Job>
As you can see I'm totally new to Twig and can't understang how that thing works exactly...Thanks for you precious help.

Replies are listed 'Best First'.
Re^5: XML::Twig usage incomprehension
by mirod (Canon) on Apr 03, 2006 at 15:22 UTC

    OK, the code below should do the trick. 2 comments: if you have control over the output format, I would have the TABLE and USERSQL elements merged into 1 element, either a simple one with the table as an attribute, or just an englobing one with 2 sub-elements. These 2 elements are linked, the format should show this. And I had to guess what to do with the GENSQL property that appeared before the USERSQL one for the second table.

    #!/usr/bin/perl use strict; use warnings; use XML::Twig::XPath; my $t= XML::Twig::XPath->new( twig_roots => { Job => \&job}, pretty_pr +int => 'indented'); $t->parsefiel( $ARGV[0]); exit; sub job { my ($t, $job)= @_; my @sub_record_table= $job->findnodes( './/SubRecord[./Property[@N +ame="Name" and text()="TABLE"]]'); return unless( @sub_record_table); my $out_job= XML::Twig::Elt->new( job => { identifier => $job->att +( 'Identifier') }); foreach my $sub_record ( @sub_record_table) { my $table = $sub_record->field( 'Property[@Name="Value"]'); $out_job->insert_new_elt( last_child => TABLE => $table); my $found_sql=0; while( !$found_sql) { $sub_record= $sub_record->next_sibling( 'SubRecord') or la +st; my $tag= $sub_record->field( 'Property[@Name="Name"]'); my $content= $sub_record->field( 'Property[@Name="Value"]' +); if( $content ne 'No') { $found_sql=1; $out_job->insert_new_elt( last_child => $tag => $conte +nt); } } } $out_job->print(); $t->purge; # if you need to free the memory }
      Thanks a lot!!! It works really fine!

      FYI, I just wanted the TABLE and USERSQL to see if the tables referenced in these 2 fields matches. The other datas are useless in that case.

      I've juste changed the if clause :

      #!/usr/bin/perl use strict; use warnings; use XML::Twig::XPath; my $file = "T:/BI/Jerome/xml/Portage.xml"; my $t= XML::Twig::XPath->new( twig_roots => { Job => \&job}, pretty_pr +int => 'indented'); $t->parsefile( $file ); print_to_file('T:/BI/Jerome/xml/traite.xml'); exit; sub job { my ($t, $job)= @_; my @sub_record_table= $job->findnodes( './/SubRecord[./Property[@N +ame="Name" and text()="TABLE"]]'); return unless( @sub_record_table); my $out_job= XML::Twig::Elt->new( job => { identifier => $job->att +( 'Identifier') }); foreach my $sub_record ( @sub_record_table) { my $table = $sub_record->field( 'Property[@Name="Value"]'); $out_job->insert_new_elt( last_child => TABLE => $table); my $found_sql=0; while( !$found_sql) { $sub_record= $sub_record->next_sibling( 'SubRecord') or la +st; my $tag= $sub_record->field( 'Property[@Name="Name"]'); my $content= $sub_record->field( 'Property[@Name="Value"]' +); #if( $content ne 'No') # { $found_sql=1; # $out_job->insert_new_elt( last_child => $tag => $cont +ent); # } if ( $tag =~ /USERSQL/) { $found_sql = 1; $out_job->insert_new_elt( last_child => $tag => $conte +nt); } } } $out_job->print(); $t->purge; # if you need to free the memory }
      The thing I couldn't get was how to filter the output, and it was by the findnodes() function.

      Really appreciated your help and the time you've spent on my problem.

      Regards.

      Jerome.