Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys! Im using XML Twig to try and extract some data from an XML, put it into arrays, and the write into EXCEL. here is an extract of the XML
- <JOB APPLICATION="BEART-NA-ARTEMIS" APR="0" AUG="0" AUTHOR="j926209" + AUTOARCH="0" CHANGE_USERID="e000775" CMDLINE="art_extract.ksh %%PROP +FILE %%CONFIG %%AGENDA" CONFIRM="0" CREATION_DATE="20080703" CREATION +_TIME="094054" CREATION_USER="f064188" CRITICAL="0" CYCLIC="0" DAYS_A +ND_OR="OR" DEC="0" DESCRIPTION="Daily CPP2 Extract from Artemis" FEB= +"0" IND_CYCLIC="START" INTERVAL="00000M" JAN="0" JOBNAME="BARTDE01-CP +2DLY" JUL="0" JUN="0" MAR="0" MAXDAYS="0" MAXRERUN="0" MAXRUNS="0" MA +XWAIT="5" MAY="0" MEMNAME="CPP2 Daily Extract" MULTY_AGENT="N" NODEID +="BEART-MIS-UNIX" NOV="0" OCT="0" OWNER="arteprd" PRIORITY="05" RETRO +="0" SEP="0" SHIFT="IGNOREJOB" SHIFTNUM="+00" SYSDB="0" TAG_RELATIONS +HIP="OR" TASKTYPE="Command" USE_INSTREAM_JCL="0" PREVENTNCT2="N"> <INCOND AND_OR="AND" NAME="PL-BARTDE00-IU-MAINT-H-OK" ODATE="ODAT" / +> <OUTCOND NAME="PL-BARTDE01-CP2DLY-OK" ODATE="ODAT" SIGN="ADD" /> <TAG_NAMES TAG_NAME="TUE-to-FRI-TAG" /> <OUTCOND NAME="PL-BARTDE00-IU-MAINT-H-OK" ODATE="ODAT" SIGN="DEL" /> + - <ON CODE="NOTOK" STMT="*"> <DOMAIL CC_DEST="suhail.x.siddiqui@jpmchase.com,harish.x.halikere@jp +mchase.com,rinku.x.mishra@jpmchase.com" DEST="ib_cp_operate@jpmorgan. +com" MESSAGE="0020ATTN : Operate Team.00000027%%JOBNAME HAS ENDED NOT +-OK.00000019Please Investigate." SUBJECT="CONTROL - M BATCH CPP2 Dail +y Extract JOB ENDED NOT-OK." URGENCY="R" /> </ON> </JOB>
My code at the moment is as follows:
#!/usr/bin/perl -w use warnings; no warnings 'uninitialized'; use strict; use XML::Twig; use XML::LibXML; use strict; use Spreadsheet::WriteExcel; use Spreadsheet::WriteExcel::Utility; my @group; #array for group name my @Jobs; #array for jobname my @Desc; #array for description my @Time; #array for start time my @Tag; #array for tags my @Days; #array for days my @Outcond; #array for out conditions my @Incond; #array for in conditions my $file; #variable to hold file location my $output; #variable to hold output file name print "This program will parse an XML file and output the data into ex +cel for an overview of the schedule\n\n"; #rough menu system for +user to interact with print "Please enter the name of the xml file you wish to parse\n"; print "(ensure it is it the same directory, if not include the path)\n +"; print "xml file name: "; $file =<>; #take in +put filename in as file variable print "\nplease enter the name for your output xls file (e.g. Output): + "; $output=<>; #take o +utput filename chomp($output); $output="$output.xls"; +#add .xls extension to the filename my $twig = new XML::Twig(KeepSpaces => 'true',TwigHandlers => { "JOB " + => \&Output,"TAG_NAMES " => \&Tags, " SCHED_GROUP " => \&Group, " JO +B/OUTCOND " => \&Outcond, " INCOND " => \&Incond}); #create new +twig with spacing kept and with 2 handlers defined below $twig->parsefile($file); #c +reate twig $twig->flush; print "The array has " . scalar(@group) . " elements\n"; print "The array has " . scalar(@Jobs) . " elements\n"; + #print out number of elements in arrays (used as tes +t purposes and kpet int for future debug) print "The array has " . scalar(@Desc) . " elements\n"; print "The array has " . scalar(@Time) . " elements\n"; print "The array has " . scalar(@Tag) . " elements\n"; print "The array has " . scalar(@Days) . " elements\n"; print "The array has " . scalar(@Outcond) . " elements\n"; print "The array has " . scalar(@Incond) . " elements\n"; my $count = 1; while ($count <= (scalar(@Jobs))) { print ("Jobname $Jobs[$count-1] runs on days: $Days[$count-1] and ta +g: $Tag[$count-1] at: $Time[$count-1] the description is: $Desc[$coun +t-1] with Outcond $Outcond[$count-1] and In condition $Incond[$count- +1] \n"); #outputs data to cmd line as a string again used as t +est purpose but will keep for future debug $count++; } our $workbook = Spreadsheet::WriteExcel->new($output); + # create new spreadsheet our %format = &get_excel_cell_formats; + #set formats based on formats defined below our %worksheet; # worksheet for text $worksheet{text} = $workbook->add_worksheet('text'); + #create text outputs defined below &write_text($worksheet{text}); $workbook->close(); exit; sub Output { #tiwg +outputs that reffer to JOB tag data (all data in xml file between <JO +B>...</JOB> my( $twig, $s )= @_; push(@Jobs,( $s->att('JOBNAME'))); + #Reffers to the value found at JOBNAME under JOB push(@Time,($s->att('TIMEFROM'))); push(@Desc,($s->att('DESCRIPTION'))); push(@Days,($s->att('DAYS'))); } sub Tags { #tiwg ou +tputs that reffer to TAG tag data (all data in xml file between <TAG_ +NAMES>...</TAG_NAMES> my( $twig, $s )= @_; push(@Tag,( $s->att('TAG_NAME'))); + #Reffers to the Tag name attacted to the job } sub Group { #tiwg o +utputs that reffer to TAG tag data (all data in xml file between <TAG +_NAMES>...</TAG_NAMES> my( $twig, $s )= @_; push(@group,( $s->att('GROUP'))); + #Reffers to the Group name attacted to the job } sub Outcond { #tiwg + outputs that reffer to TAG tag data (all data in xml file between <T +AG_NAMES>...</TAG_NAMES> my( $twig, $s )= @_; push(@Outcond,( $s->att('NAME'))); + #Reffers to the Tag name attacted to the job } sub Incond { #tiwg +outputs that reffer to TAG tag data (all data in xml file between <TA +G_NAMES>...</TAG_NAMES> my( $twig, $s )= @_; push(@Incond,( $s->att('NAME'))); + #Reffers to the Tag name attacted to the job } sub get_excel_cell_formats { + #format standards for spreadsheet my $format; # bold $format{bold} = $workbook->add_format(); $format{bold}->set_bold(); return %format; } sub write_text { my $worksheet = shift; # Set the column width to 15 $worksheet->set_column('A:A', 40); $worksheet->set_column('B:B', 40); $worksheet->set_column('C:C', 10); $worksheet->set_column('D:D', 18); $worksheet->set_column('E:E', 12); $worksheet->set_column('F:F', 50); $worksheet->set_column('G:G', 100); $worksheet->set_column('H:H', 100); #create starting x and y values for each data my ($x1,$y1)=(0,0); # starting point for the headings my ($xgroup,$ygroup)=(0,1); # starting point for Group name my ($xjobs,$yjobs)=(1,1); # starting point for jobs my ($xDays,$yDays)=(2,1); # starting point for jobs my ($xTags,$yTags)=(3,1); # starting point for tags my ($xStarting,$yStarting)=(4,1); # starting point for time my ($xDesc,$yDesc)=(5,1); # starting point for Desc my ($xOut,$yOut)=(6,1); # starting point for Desc my ($xIn,$yIn)=(7,1); # starting point for Desc my ($xouttest,$youttest)=(8,1); # starting point for Desc + my $count = 1; #create headings for all data entries $worksheet->write($y1, $x1++, "Group Name", $format{bold}); $worksheet->write($y1, $x1++, "Job Name", $format{bold}); $worksheet->write($y1, $x1++, "Days", $format{bold}); $worksheet->write($y1, $x1++, "Schedule Tag", $format{bold}); $worksheet->write($y1,$x1++, "Starting time", $format{bold}); + $worksheet->write($y1,$x1++, "Description", $format{bold}); $worksheet->write($y1,$x1++, "Out Conditions", $format{bold}); $worksheet->write($y1,$x1++, "In Conditions", $format{bold}); #loop all data and output in correct positions while ($count <= (scalar(@Jobs))) { $worksheet->write($ygroup, $xgroup, $group[$count-1]); $worksheet->write($yjobs++, $xjobs, $Jobs[$count-1]); $worksheet->write($yDays++, $xDays, $Days[$count-1]); $worksheet->write($yTags++, $xTags, $Tag[$count-1]); $worksheet->write($yStarting++, $xStarting, $Time[$count-1]); $worksheet->write($yDesc++, $xDesc, $Desc[$count-1]); $worksheet->write($yOut++, $xOut, $Outcond[$count-1]); $worksheet->write($yIn++, $xIn, $Incond[$count-1]); $ygroup = ($ygroup+$yjobs); $count++; } }
Currently my problem is trying to get the program to read in OUTCOND NAME and store it and add another OUTCOND NAME to it since there is 2 in this extract. Some of the JOB sections only have 1 OUTCOND NAME and others may have 1,2 or 3+. How can i efficentley extract this data and store it for writting to excel?

Replies are listed 'Best First'.
Re: XML help
by Anonymous Monk on Aug 06, 2008 at 11:48 UTC
    Wrong direction. You need to stuff data into a single data structure (like an OBJECT). Example
    #!/usr/bin/perl -- use strict; use warnings; my @jobs; for ( 1 .. 3){ my $job = { Desc => "desc goes here $_", Time => "time goes here $_", Outcond => "outcond goes here $_", Incond => "Incond goes here $_", # ... }; push @jobs, $job; } use Data::Dumper; print Dumper(\@jobs),$/; __END__ $VAR1 = [ { 'Outcond' => 'outcond goes here 1', 'Time' => 'time goes here 1', 'Incond' => 'Incond goes here 1', 'Desc' => 'desc goes here 1' }, { 'Outcond' => 'outcond goes here 2', 'Time' => 'time goes here 2', 'Incond' => 'Incond goes here 2', 'Desc' => 'desc goes here 2' }, { 'Outcond' => 'outcond goes here 3', 'Time' => 'time goes here 3', 'Incond' => 'Incond goes here 3', 'Desc' => 'desc goes here 3' } ];
Re: XML help
by pjotrik (Friar) on Aug 06, 2008 at 11:35 UTC
    Quick and dirty? Keep arrayrefs in the Outcond array. UNTESTED changes to the code:
    ... my @Outcond = ([]); #array for out conditions ... sub Outcond { my( $twig, $s )= @_; push(@{$Outcond[$#Outcond]},( $s->att('NAME'))); } ... sub Output { my( $twig, $s )= @_; push(@Jobs,( $s->att('JOBNAME'))); push(@Time,($s->att('TIMEFROM'))); push(@Desc,($s->att('DESCRIPTION'))); push(@Days,($s->att('DAYS'))); push(@Outcond, []); #At the end you will have a trailing empty arr +ayref at the end of @Outcond, but you don't need to worry about that +given the way you print your output } ... sub write_text { my $worksheet = shift; ... #loop all data and output in correct positions while ($count <= (scalar(@Jobs))) { $worksheet->write($ygroup, $xgroup, $group[$count-1]); $worksheet->write($yjobs++, $xjobs, $Jobs[$count-1]); $worksheet->write($yDays++, $xDays, $Days[$count-1]); $worksheet->write($yTags++, $xTags, $Tag[$count-1]); $worksheet->write($yStarting++, $xStarting, $Time[$count-1]); $worksheet->write($yDesc++, $xDesc, $Desc[$count-1]); $worksheet->write($yOut++, $xOut, join(',', @{$Outcond[$count- +1]}); $worksheet->write($yIn++, $xIn, $Incond[$count-1]); $ygroup = ($ygroup+$yjobs); $count++; } }
    But generally, it would be better to store corresponding data (i.e. data belonging to every JOB element) together in one structure (or object). Having the data in 8 distinct arrays really hurts the eye. And it's not only non-elegant, you may easily get into problems when a child element is missing - that would mix up your arrays.

    Also, XML::Twig is excellent in processing XML documents as they come. You might output your record to excel every time the whole job element has been parsed (i.e. in the Output sub). If you want to store all the data, another XML library may be better, personally, I'd choose XML::Simple.

Re: XML help
by Anonymous Monk on Aug 06, 2008 at 11:34 UTC
    How can i efficentley extract this data and store it for writting to excel?
    First you do it by any means you can, and then, if its too slow, you try to make it more efficent(sic)
      Big help!
Re: XML help
by smunro16 (Initiate) on Aug 06, 2008 at 11:25 UTC