http://qs1969.pair.com?node_id=425093


in reply to Re: how to strip XML into Plain Text file
in thread how to strip XML into Plain Text file

... <img alt="Next >>" src="../next_button.jpg" />*Boom*

And this is why you use a real parser, not just a regex . . .

Update: Just to clarify the above is a pathological case and if you're reasonably sure that it probably won't occur then go ahead and use the simple s///; but be aware that it's not bulletproof and know where to find the right tool when the sledgehammer doesn't cut it any more.

Replies are listed 'Best First'.
Re^3: how to strip XML into Plain Text file
by BUU (Prior) on Jan 26, 2005 at 07:55 UTC
    Since we're being pedantic about it, is '>' actually allowed inside attribute values in XML?

      Yes. Only < is not.

      Makeshifts last the longest.

      xmllint doesn't gripe about it:

      freebie:~ 677> cat foo.xml + 9:34:27 <?xml version="1.0" encoding="utf8" ?> <testing> <img alt="Next >>" src="../next_button.jpg" /> </testing> freebie:~ 678> xmllint --noout foo.xml + 9:34:29 freebie:~ 679> + 9:34:35