The story

I am working on a prototype DSL and I have crafted a small-ish converter that reads us validated XML and throws up transformed SQL code based on the source markup.

You can find the entire code for reference at the LUARM/ITPSL SVN repository


A typical XML input markup is given below:
<?xml version="1.0"?> <itpslsig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <itpslheader> <signid> 5938724b6b41a834ac695529dd104ed0 </signid> <signdate> <year>2010</year> <month>12</month> <day>20</day> </signdate> <ontology> <reason>intentional</reason> <revision>1.0</revision> <user_role>ordinary_users</user_role> <detectby>multi</detectby> <multihost>no</multihost> <hostlist>proteas,dionisos,slart,cn1,panoptis</hostl +ist> <weightmatrix>3,10,20,70</weightmatrix> <os>linux</os> <osver>2.6</osver> <keywords>DoS software install DoS loiq </keywords> <synopsis> This signature predicts the usage of the +Low Orbit Ion Cannon tool for DDoS attacks. </synopsis> </ontology> </itpslheader> <itpslbody> <mainblock> <mainop>as_a_result_of</mainop> <subblock> <subop>AND</subop> <fileexists> <filename>loiq</filename> <type>executable</type> <location>OR (#userhome#/*,/site/*,/tmp/*,/tem +p/*)</location> <singlefile>yes</singlefile> <ownedbyuser>johnc</ownedbyuser> </fileexists> <fileexists> <filename>loiq.pro</filename> <type>textdata</type> <location>OR(#userhome#/*,/site/*,/tmp/*,/temp +/*)</location> <ownedbyuser>johnc</ownedbyuser> <singlefile>yes</singlefile> </fileexists> <fileexists> <filename>loiq.qrc</filename> <type>textdata</type> <location>OR(#userhome#/*,site/*,/tmp/*,/temp/ +*)</location> <singlefile>yes</singlefile> <ownedbyuser>johnc</ownedbyuser> </fileexists> </subblock> <subblock> <subop>single</subop> <userexec> <username>johnc</username> <name>OR (file-roller,tar,bunzip2)</name> <path>OR(/usr/bin/,/usr/local/bin)</path> <singleprocess>yes</singleprocess> <argumentlist>loiq*.bz2</argumentlist> <pattern>any</pattern> </userexec> </subblock> <subblock> <subop>single</subop> <fileexists> <filename>*</filename> <type>any</type> <location>OR (#userhome#/.mozilla/*,#userhome# +/.opera)</location> <singlefile>yes</singlefile> <withcontents> <stringsearch>"http://sourceforge.net/proj +ects/loiq"</stringsearch> </withcontents> <ownedbyuser>johnc</ownedbyuser> </fileexists> </subblock> </mainblock> </itpslbody> </itpslsig>

In short, the program creates an XML::Twig::XPath-ed structure with handlers. The problems start with calling the 'parsesubs' subroutine:
my $twig = new XML::Twig::XPath( TwigHandlers => { #ITPSL header parsing data "/itpslsig/itpslheader/ontology/weightmatrix" => \&getwm, "/itpslsig/itpslheader/ontology/detectby" => \&getdetectmethods, "/itpslsig/itpslheader/ontology/os" => \&getos, "/itpslsig/itpslheader/ontology/osver" => \&getosver, #ITPSL body parsing data "/itpslsig/itpslbody/mainblock/mainop" => \&getmainop, "/itpslsig/itpslbody/mainblock" => \&getnoofsubblocks, "/itpslsig/itpslbody/mainblock/subblock" => \&parsesubs, }); # parse, handling nodes on the way $twig->parsefile( shift @ARGV );

Part of the 'parsesubs' subroutine calls via if statements other subroutines to localize the parsing of specific markup directives. The anonymous array reference ($sblockstack) is my crude way of having a stack of parsed directives (an array of arrays, each row representing the parsed directives per sub-block).
The problem
When I execute the code on a Linux based Perl 5.12 system, what I get (excluding harmless warnings and with the help of Data::Dumper) is the following:
Called parsesubs Directive is fileexists Pushing to sblockstack [0] [0] Directive is fileexists Pushing to sblockstack [0] [1] Directive is fileexists Pushing to sblockstack [0] [2] Called parsesubs Directive is userexec Pushing to sblockstack [1] [0] Called parsesubs Directive is fileexists Pushing to sblockstack [2] [0] $VAR1 = [ [ '##STARTOP:fileexists', '##operand:type:executable', '##operand:location:OR (#userhome#/*,/site/*,/tmp/*,/temp/ +*)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq.pro', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,/site/*,/tmp/*,/temp/* +)', '##operand:ownedbyuser:johnc', '##operand:singlefile:yes', '##operand:filename:loiq.qrc', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,site/*,/tmp/*,/temp/*) +', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##ENDOFOP##', '##operand:filename:loiq', '##operand:type:executable', '##operand:location:OR (#userhome#/*,/site/*,/tmp/*,/temp/ +*)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq.pro', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,/site/*,/tmp/*,/temp/* +)', '##operand:ownedbyuser:johnc', '##operand:singlefile:yes', '##operand:filename:loiq.qrc', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,site/*,/tmp/*,/temp/*) +', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq', '##operand:type:executable', '##operand:location:OR (#userhome#/*,/site/*,/tmp/*,/temp/ +*)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq.pro', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,/site/*,/tmp/*,/temp/* +)', '##operand:ownedbyuser:johnc', '##operand:singlefile:yes', '##operand:filename:loiq.qrc', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,site/*,/tmp/*,/temp/*) +', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq', '##operand:type:executable', '##operand:location:OR (#userhome#/*,/site/*,/tmp/*,/temp/ +*)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq.pro', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,/site/*,/tmp/*,/temp/* +)', '##operand:ownedbyuser:johnc', '##operand:singlefile:yes', '##operand:filename:loiq.qrc', '##operand:type:textdata', '##operand:location:OR(#userhome#/*,site/*,/tmp/*,/temp/*) +', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:*', '##operand:type:any', '##operand:location:OR (#userhome#/.mozilla/*,#userhome#/. +opera)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc' ], [ '##STARTOP:fileexists', '##ENDOFOP##' ], [ '##STARTOP:fileexists', '##ENDOFOP##' ] ]; $VAR2 = [ [ '##STARTOP:userexec', '##operand:name:OR (file-roller,tar,bunzip2)', '##operand:path:OR(/usr/bin/,/usr/local/bin)', '##operand:singleprocess:yes', '##operand:argumentlist:loiq*.bz2', '##operandpattern:any', '##ENDOFOP##' ] ]; $VAR3 = [ [ '##STARTOP:fileexists', '##ENDOFOP##' ] ];

Obviously the Data::Dumper output tells me that I have a problem in parsing the whole thing properly. Each directive is not terminated properly ( '##ENDOFOP##' string) and certain directives are not entered into their proper place on the array of arrays. A proper output I would expect/want to have by Data::Dumper would be like the following:
... Directive is fileexists Pushing to sblockstack [2] [0] $VAR1 = [ [ '##STARTOP:fileexists', '##operand:type:executable', '##operand:location:OR (#userhome#/*,/site/*,/tmp/*,/temp/ +*)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq', '##operand:type:textdata', '##ENDOFOP##' ], [ '##STARTOP:fileexists', '##operand:type:executable', '##operand:location:OR (#userhome#/*,/site/*,/tmp/*,/temp/ +*)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq.pro', '##operand:type:textdata', '##ENDOFOP##' ], [ '##STARTOP:fileexists', '##operand:type:executable', '##operand:location:OR (#userhome#/*,/site/*,/tmp/*,/temp/ +*)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:loiq.qrc', '##operand:type:textdata', '##ENDOFOP##' ] ]; $VAR2 = [ [ '##STARTOP:userexec', '##operand:name:OR (file-roller,tar,bunzip2)', '##operand:path:OR(/usr/bin/,/usr/local/bin)', '##operand:singleprocess:yes', '##operand:argumentlist:loiq*.bz2', '##operandpattern:any', '##ENDOFOP##' ] ]; $VAR3 = [ [ '##STARTOP:fileexists', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc', '##operand:filename:*', '##operand:type:any', '##operand:location:OR (#userhome#/.mozilla/*,#userhome#/. +opera)', '##operand:singlefile:yes', '##operand:ownedbyuser:johnc' ] ];

The Question
What am I doing wrong with this program structure? Can somebody suggest a structure that works?
Many thanks!
GM

In reply to XML::Twig blues by gmagklaras

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.