comment on

Hi Guys! Im using XML Twig to try and extract some data from an XML, put it into arrays, and the write into EXCEL. here is an extract of the XML

- <JOB APPLICATION="BEART-NA-ARTEMIS" APR="0" AUG="0" AUTHOR="j926209"
+ AUTOARCH="0" CHANGE_USERID="e000775" CMDLINE="art_extract.ksh %%PROP
+FILE %%CONFIG %%AGENDA" CONFIRM="0" CREATION_DATE="20080703" CREATION
+_TIME="094054" CREATION_USER="f064188" CRITICAL="0" CYCLIC="0" DAYS_A
+ND_OR="OR" DEC="0" DESCRIPTION="Daily CPP2 Extract from Artemis" FEB=
+"0" IND_CYCLIC="START" INTERVAL="00000M" JAN="0" JOBNAME="BARTDE01-CP
+2DLY" JUL="0" JUN="0" MAR="0" MAXDAYS="0" MAXRERUN="0" MAXRUNS="0" MA
+XWAIT="5" MAY="0" MEMNAME="CPP2 Daily Extract" MULTY_AGENT="N" NODEID
+="BEART-MIS-UNIX" NOV="0" OCT="0" OWNER="arteprd" PRIORITY="05" RETRO
+="0" SEP="0" SHIFT="IGNOREJOB" SHIFTNUM="+00" SYSDB="0" TAG_RELATIONS
+HIP="OR" TASKTYPE="Command" USE_INSTREAM_JCL="0" PREVENTNCT2="N">
  <INCOND AND_OR="AND" NAME="PL-BARTDE00-IU-MAINT-H-OK" ODATE="ODAT" /
+> 
  <OUTCOND NAME="PL-BARTDE01-CP2DLY-OK" ODATE="ODAT" SIGN="ADD" /> 
  <TAG_NAMES TAG_NAME="TUE-to-FRI-TAG" /> 
  <OUTCOND NAME="PL-BARTDE00-IU-MAINT-H-OK" ODATE="ODAT" SIGN="DEL" />
+ 
- <ON CODE="NOTOK" STMT="*">
  <DOMAIL CC_DEST="suhail.x.siddiqui@jpmchase.com,harish.x.halikere@jp
+mchase.com,rinku.x.mishra@jpmchase.com" DEST="ib_cp_operate@jpmorgan.
+com" MESSAGE="0020ATTN : Operate Team.00000027%%JOBNAME HAS ENDED NOT
+-OK.00000019Please Investigate." SUBJECT="CONTROL - M BATCH CPP2 Dail
+y Extract JOB ENDED NOT-OK." URGENCY="R" /> 
  </ON>
  </JOB>
[download]

My code at the moment is as follows:

#!/usr/bin/perl -w


use warnings;
no warnings 'uninitialized';
use strict;
use XML::Twig;
use XML::LibXML;
use strict;
use Spreadsheet::WriteExcel;
use Spreadsheet::WriteExcel::Utility;

my @group;                    #array for group name
my @Jobs;                    #array for jobname
my @Desc;                    #array for description
my @Time;                    #array for start time
my @Tag;                    #array for tags
my @Days;                    #array for days
my @Outcond;                    #array for out conditions
my @Incond;                    #array for in conditions
my $file;                    #variable to hold file location
my $output;                    #variable to hold output file name


print "This program will parse an XML file and output the data into ex
+cel for an overview of the schedule\n\n";     #rough menu system for 
+user to interact with
print "Please enter the name of the xml file you wish to parse\n"; 
print "(ensure it is it the same directory, if not include the path)\n
+";
print "xml file name: ";

$file =<>;                                                    #take in
+put filename in as file variable
print "\nplease enter the name for your output xls file (e.g. Output):
+ ";
$output=<>;                                                    #take o
+utput filename
chomp($output);    
$output="$output.xls";                                                
+#add .xls extension to the filename

my $twig = new XML::Twig(KeepSpaces => 'true',TwigHandlers => { "JOB "
+ => \&Output,"TAG_NAMES " => \&Tags, " SCHED_GROUP " => \&Group, " JO
+B/OUTCOND " => \&Outcond, " INCOND " => \&Incond});      #create new 
+twig with spacing kept and with 2 handlers defined below

$twig->parsefile($file);                                            #c
+reate twig
$twig->flush;


print "The array has " . scalar(@group) . " elements\n";    
print "The array has " . scalar(@Jobs) . " elements\n";               
+                 #print out number of elements in arrays (used as tes
+t purposes and kpet int for future debug)
print "The array has " . scalar(@Desc) . " elements\n";
print "The array has " . scalar(@Time) . " elements\n";
print "The array has " . scalar(@Tag) . " elements\n";
print "The array has " . scalar(@Days) . " elements\n";
print "The array has " . scalar(@Outcond) . " elements\n";
print "The array has " . scalar(@Incond) . " elements\n";

my $count = 1;
while ($count <= (scalar(@Jobs))) {
  print ("Jobname $Jobs[$count-1] runs on days: $Days[$count-1] and ta
+g: $Tag[$count-1] at: $Time[$count-1] the description is: $Desc[$coun
+t-1] with Outcond $Outcond[$count-1] and In condition $Incond[$count-
+1] \n");        #outputs data to cmd line as a string again used as t
+est purpose but will keep for future debug
  $count++;
}


our $workbook = Spreadsheet::WriteExcel->new($output);                
+                 # create new spreadsheet
our %format = &get_excel_cell_formats;                                
+        #set formats based on formats defined below

our %worksheet;
# worksheet for text
$worksheet{text}   = $workbook->add_worksheet('text');                
+                 #create text outputs defined below
&write_text($worksheet{text});


$workbook->close();
exit;


sub Output {                                                    #tiwg 
+outputs that reffer to JOB tag data (all data in xml file between <JO
+B>...</JOB>
my( $twig, $s )= @_;
   
push(@Jobs,( $s->att('JOBNAME')));                                    
+    #Reffers to the value found at JOBNAME under JOB
push(@Time,($s->att('TIMEFROM')));
push(@Desc,($s->att('DESCRIPTION')));
push(@Days,($s->att('DAYS')));

}

sub Tags {                                                    #tiwg ou
+tputs that reffer to TAG tag data (all data in xml file between <TAG_
+NAMES>...</TAG_NAMES>
my( $twig, $s )= @_;
   
push(@Tag,( $s->att('TAG_NAME')));                                    
+    #Reffers to the Tag name attacted to the job

}

sub Group {                                                    #tiwg o
+utputs that reffer to TAG tag data (all data in xml file between <TAG
+_NAMES>...</TAG_NAMES>
my( $twig, $s )= @_;
   
push(@group,( $s->att('GROUP')));                                     
+   #Reffers to the Group name attacted to the job

}

sub Outcond {                                                    #tiwg
+ outputs that reffer to TAG tag data (all data in xml file between <T
+AG_NAMES>...</TAG_NAMES>
my( $twig, $s )= @_;
   
push(@Outcond,( $s->att('NAME')));                                    
+    #Reffers to the Tag name attacted to the job

}

sub Incond {                                                    #tiwg 
+outputs that reffer to TAG tag data (all data in xml file between <TA
+G_NAMES>...</TAG_NAMES>
my( $twig, $s )= @_;
   
push(@Incond,( $s->att('NAME')));                                     
+   #Reffers to the Tag name attacted to the job

}

sub get_excel_cell_formats {                                          
+  #format standards for spreadsheet
    my $format;

    # bold
    $format{bold} = $workbook->add_format();
    $format{bold}->set_bold();
    return %format;
}


sub write_text {

    my $worksheet = shift;                                            

    # Set the column width to 15    
    $worksheet->set_column('A:A', 40);
    $worksheet->set_column('B:B', 40);    
    $worksheet->set_column('C:C', 10);
    $worksheet->set_column('D:D', 18);
    $worksheet->set_column('E:E', 12);
    $worksheet->set_column('F:F', 50);
    $worksheet->set_column('G:G', 100);
    $worksheet->set_column('H:H', 100);
    
    #create starting x and y values for each data 
    my ($x1,$y1)=(0,0);            # starting point for the headings
    my ($xgroup,$ygroup)=(0,1);        # starting point for Group name
    my ($xjobs,$yjobs)=(1,1);        # starting point for jobs
    my ($xDays,$yDays)=(2,1);        # starting point for jobs
    my ($xTags,$yTags)=(3,1);        # starting point for tags
    my ($xStarting,$yStarting)=(4,1);    # starting point for time
    my ($xDesc,$yDesc)=(5,1);        # starting point for Desc
    my ($xOut,$yOut)=(6,1);        # starting point for Desc
    my ($xIn,$yIn)=(7,1);        # starting point for Desc

    my ($xouttest,$youttest)=(8,1);        # starting point for Desc  
+  
    
    my $count = 1;

    #create headings for all data entries
    $worksheet->write($y1, $x1++, "Group Name", $format{bold});    
    $worksheet->write($y1, $x1++, "Job Name", $format{bold});
    $worksheet->write($y1, $x1++, "Days", $format{bold});
    $worksheet->write($y1, $x1++, "Schedule Tag", $format{bold});
    $worksheet->write($y1,$x1++, "Starting time", $format{bold});     
+   
    $worksheet->write($y1,$x1++, "Description", $format{bold});    
    $worksheet->write($y1,$x1++, "Out Conditions", $format{bold});    
    $worksheet->write($y1,$x1++, "In Conditions", $format{bold});    

    #loop all data and output in correct positions
    while ($count <= (scalar(@Jobs))) {
        $worksheet->write($ygroup, $xgroup, $group[$count-1]);
          $worksheet->write($yjobs++, $xjobs, $Jobs[$count-1]);
          $worksheet->write($yDays++, $xDays, $Days[$count-1]);
        $worksheet->write($yTags++, $xTags, $Tag[$count-1]);
        $worksheet->write($yStarting++, $xStarting, $Time[$count-1]);
        $worksheet->write($yDesc++, $xDesc, $Desc[$count-1]);
        $worksheet->write($yOut++, $xOut, $Outcond[$count-1]);
        $worksheet->write($yIn++, $xIn, $Incond[$count-1]);    

    $ygroup = ($ygroup+$yjobs);    
    $count++;
    }
}
[download]

Currently my problem is trying to get the program to read in OUTCOND NAME and store it and add another OUTCOND NAME to it since there is 2 in this extract. Some of the JOB sections only have 1 OUTCOND NAME and others may have 1,2 or 3+. How can i efficentley extract this data and store it for writting to excel?

In reply to XML help by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.