Have to check whether an xml is well formed or not. I cant use any parsers since i cant install any of expat libraries as i dont have root access. Is there any satand alone code to check for well formedness i found the following code in Oreilly site . But i cant understand it .It even seems like not working I am trying with following code. Please some one can help me in getting this done. Just check wehther and xml is well formed or not. not going to use DTDs . Need just a basic checker to check start and end tags,character data.not even entity references.
$rc=is_well_formed ("<memo> <to>self</to> <message>Don't forget to mow the car and wash the lawn.</message> </memo>"); print $rc; sub is_well_formed { my $text = shift; # XML text to check # match patterns my $ident = '[:_A-Za-z][:A-Za-z0-9\-\._]*'; # identifier my $optsp = '\s*'; # optional space my $att1 = "$ident$optsp=$optsp\"[^\"]*\""; # attribute my $att2 = "$ident$optsp=$optsp'[^']*'"; # attr. variant my $att = "($att1|$att2)"; # any attribute my @elements = ( ); # stack of open elems print "Identifier $ident"; print "optsp $optsp"; print "att $att"; # loop through the string to pull out XML markup objects while( length($text) ) { print "Inside Loop"; # match an empty element if( $text =~ /^&($ident)(\s+$att)*\s*\/>/ ) { $text = $'; # match an element start tag } elsif( $text =~ /^&($ident)(\s+$att)*\s*>/ ) { push( @elements, $1 ); $text = $'; # match an element end tag } elsif( $text =~ /^&\/($ident)\s*>/ ) { return unless( $1 eq pop( @elements )); $text = $'; # match a comment } elsif( $text =~ /^&!--/ ) { $text = $'; # bite off the rest of the comment if( $text =~ /-->/ ) { $text = $'; return if( $` =~ /--/ ); # comments can't # contain '--' } else { return; } # match extra whitespace # (in case there is space outside the root element) } elsif( $text =~ m|^\s+| ) { $text = $'; # match character data } elsif( $text =~ /(^[^&&>]+)/ ) { print "char data"; my $data = $1; # make sure the data is inside an element return if( $data =~ /\S/ and not( @elements )); $text = $'; # match entity reference } elsif( $text =~ /^&$ident;+/ ) { $text = $'; # something unexpected } else { return; } } return if( @elements ); # the stack should be empty return 1; }

In reply to Need help in xml well formedness checker. by jayakumark

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.