in reply to Re^2: xml parsing without using cpan modules
in thread xml parsing without using cpan modules

It is possible to write an XML parser using regular expressions. Check out "REX: XML Shallow Parsing with Regular Expressions", http://www.cs.sfu.ca/~cameron/REX.html. It even has Perl code for doing.

It effectively splits the XML into a list of strings on logical boundaries by repeating a regular expression that matches XML markup. It is fairly easy to find the type of each chunk by looking at the first couple of characters.

  • Comment on Re^3: xml parsing without using cpan modules

Replies are listed 'Best First'.
Re^4: xml parsing without using cpan modules
by Aristotle (Chancellor) on Aug 10, 2004 at 19:22 UTC

    Of course you can parse using regular expressions. You just shouldn't grope around in a string representing an XML document using regular expressions, because you have to be certain about the context in which any match occured. That means you have to scan the string strictly front-to-back, probably using the /gc options and the \G anchor to make sure you don't miss anything. Simply picking matches out of the middle of the string is very likely to be a broken approach unless you are dealing with a known subset of XML syntax.

    Makeshifts last the longest.

      Absolutely. That regex only works because it starts at a known position (start of markup) and matches either the entire chunk or fails. It needs to see the entire document and can't work line-by-line. It is also complicated to handle the whole XML synax.
      Could you please be bit clear?

      Thanks & Regards

      Nalina