Re^3: Getting the text of the html document

in reply to Re^2: Getting the text of the html document
in thread Getting the text of the html page

That's a good point. My little regexes there don't convert every single entity, but it strips EVERY tag, and converts the <'s, >'s, quotes, and ampersands. Not much else would be left behind, honestly.

Regardless of that fact, bradcathey, seems to have a very nice solution which is much faster than regex anyway.

My Site in Progress

Comment on Re^3: Getting the text of the html document

Replies are listed 'Best First'.
Re^4: Getting the text of the html document by davorg (Chancellor) on Jul 19, 2005 at 09:36 UTC
What would your regex do with a tag like this: `<img src="next.gif" alt="-->" />` [download] Honestly, it's best to use a real parser. -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply] [d/l]

In Section Seekers of Perl Wisdom