Perl Monk, Perl Meditation | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Hello again haukex,
the thread is interesting and I made my best last night to provide an XML::Twig solution, but due to limited understanding of the XML in general I report here some thing i do not understand about the file you presentend as input. First I cheated because I get the sample XML file before writing the program, because with XML i always go for a try-and-check path.. Second, in my wide ignorance, I really dont know how XHTML, DTD, DOM and transitional can affect the approach to the XML to parse. My sin. Third: if XML::Twig (the only module I use for these task) complains about the document I'll use W3C validator to check the content, before crashing my head with the content, task i very dont like. So, your sample is a valid one. I put it after the __DATA__ token and I got the following error:
After half an hour searching the web I ended reading of xpath bugs dated 2009 but no clue at all. Any attempt to brutally cut the XML, removing lines and tags ended with the very same error, at the same line (??). So I tested the YourMother's solution with your own modification and I get many errors but also the correct solution:
So i assumed the XML had some problems effectively: my others attempts to fix it using such detailed reports emitted by XML::LibXML had no more luck that previous ones. As last resource i put the XML sample into a separate file and: TADA' all run smooth (not considering the   issue) with XML::Twig as presented above. Any suggestion? Which is the best module to report formal errors in the XML structure? are the above reported errors real ones or are due to limits of the parsing module? If the thread will continue can be the Rosetta of Perl XML parsing. Goood one! L*
There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS. In reply to Re^3: Parsing HTML/XML with Regular Expressions (validation of the content)
by Discipulus
|
|