Re^3: xml parsing without using cpan modules

It is possible to write an XML parser using regular expressions. Check out "REX: XML Shallow Parsing with Regular Expressions", http://www.cs.sfu.ca/~cameron/REX.html. It even has Perl code for doing.

It effectively splits the XML into a list of strings on logical boundaries by repeating a regular expression that matches XML markup. It is fairly easy to find the type of each chunk by looking at the first couple of characters.

Comment on Re^3: xml parsing without using cpan modules

Replies are listed 'Best First'.
Re^4: xml parsing without using cpan modules by Aristotle (Chancellor) on Aug 10, 2004 at 19:22 UTC
Of course you can parse using regular expressions. You just shouldn't grope around in a string representing an XML document using regular expressions, because you have to be certain about the context in which any match occured. That means you have to scan the string strictly front-to-back, probably using the `/gc` options and the `\G` anchor to make sure you don't miss anything. Simply picking matches out of the middle of the string is very likely to be a broken approach unless you are dealing with a known subset of XML syntax. Makeshifts last the longest.	[reply]
Re^5: xml parsing without using cpan modules by iburrell (Chaplain) on Aug 10, 2004 at 19:38 UTC
Absolutely. That regex only works because it starts at a known position (start of markup) and matches either the entire chunk or fails. It needs to see the entire document and can't work line-by-line. It is also complicated to handle the whole XML synax.	[reply]
Re^5: xml parsing without using cpan modules by Nalina (Monk) on Aug 11, 2004 at 06:29 UTC
Could you please be bit clear? Thanks & Regards Nalina	[reply]
Re^6: xml parsing without using cpan modules by mirod (Canon) on Aug 11, 2004 at 12:39 UTC
I guess it's time to point you at On XML parsing and at The Annotated XML Spcification, to see what madness can lurk in the dark corners of a seemingly innocent XML document ;--)	[reply]