Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: how to strip XML into Plain Text file

by sleepingsquirrel (Hermit)
on Jan 26, 2005 at 00:21 UTC ( #425086=note: print w/replies, xml ) Need Help??

in reply to how to strip XML into Plain Text file

perl -p -e 's/<[^>]*>//g' <foo.xml

-- All code is 100% tested and functional unless otherwise noted.

Replies are listed 'Best First'.
Re^2: how to strip XML into Plain Text file
by Fletch (Bishop) on Jan 26, 2005 at 01:00 UTC

    ... <img alt="Next >>" src="../next_button.jpg" />*Boom*

    And this is why you use a real parser, not just a regex . . .

    Update: Just to clarify the above is a pathological case and if you're reasonably sure that it probably won't occur then go ahead and use the simple s///; but be aware that it's not bulletproof and know where to find the right tool when the sledgehammer doesn't cut it any more.

      Since we're being pedantic about it, is '>' actually allowed inside attribute values in XML?

        Yes. Only < is not.

        Makeshifts last the longest.

        xmllint doesn't gripe about it:

        freebie:~ 677> cat foo.xml + 9:34:27 <?xml version="1.0" encoding="utf8" ?> <testing> <img alt="Next >>" src="../next_button.jpg" /> </testing> freebie:~ 678> xmllint --noout foo.xml + 9:34:29 freebie:~ 679> + 9:34:35

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://425086]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2022-08-08 08:12 GMT
Find Nodes?
    Voting Booth?

    No recent polls found