Hi monks,
I have been using the Perl Module XML::Simple quite a lot recently to parse XML and to test if XML is valid and well formed. Everything works great except for one problem.
Sometimes the XML I need to parse has ":" inside the tags, according to certain namespaces. For example
<person>
<person:name>Joe</person:name>
<person:job>programmer</person:job>
</person>
This is not valid XML according to W3C standards. If you save this as an XML file, and try to open it, you get an error. Also, if you go to http://www.w3schools.com/dom/dom_validate.asp and try to validate the XML, you get this error: "reference to undeclared namespace 'person'".
My problem is that XML::Simple does not consider this XML invalid. For example, this code does not return an error:
#!/exlibris/metalib/m4_b/product/bin/perl
use strict;
use XML::Simple;
my $source_code =
"<person>
<person:name>Joe</person:name>
<person:job>programmer</person:job>
</person>";
my $xs = new XML::Simple();
my $hash;
##This should return an error!!
eval {$hash = $xs->XMLin($source_code)};
if ($@){
print "$@";
exit(0);
}
Since I am now working on a project to transform invalid XML to valid XML, and am also using a C XML parser that returns an error for this sort of XML, I have a problem.
I know how to make the XML valid, but I also want XML::Simple to fail if the XML is invalid. Does anyone have any ideas what to do?
Thanks,
Guy
---A truth that's told with bad intent beats all the lies you can invent
Update: Fixed typo XML::simple -> XML::Simple
Update 2: Added an example from w3schools
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.