http://qs1969.pair.com?node_id=425086


in reply to how to strip XML into Plain Text file

perl -p -e 's/<[^>]*>//g' <foo.xml


-- All code is 100% tested and functional unless otherwise noted.

Replies are listed 'Best First'.
Re^2: how to strip XML into Plain Text file
by Fletch (Bishop) on Jan 26, 2005 at 01:00 UTC

    ... <img alt="Next >>" src="../next_button.jpg" />*Boom*

    And this is why you use a real parser, not just a regex . . .

    Update: Just to clarify the above is a pathological case and if you're reasonably sure that it probably won't occur then go ahead and use the simple s///; but be aware that it's not bulletproof and know where to find the right tool when the sledgehammer doesn't cut it any more.

      Since we're being pedantic about it, is '>' actually allowed inside attribute values in XML?

        Yes. Only < is not.

        Makeshifts last the longest.

        xmllint doesn't gripe about it:

        freebie:~ 677> cat foo.xml + 9:34:27 <?xml version="1.0" encoding="utf8" ?> <testing> <img alt="Next >>" src="../next_button.jpg" /> </testing> freebie:~ 678> xmllint --noout foo.xml + 9:34:29 freebie:~ 679> + 9:34:35