in reply to Re: Removing text between HTML tags
in thread Removing text between HTML tags

Thanks,  s/<.+?>//g; is awesome, it removes all html tags, but I agree will that I should use HTML Parser, as I have parser thousands of URL and it is a big risk to use regex. Also, I found that website generates XML pages so we can parse :) so any XML parser you can suggest? I found XML::Parser and will try that.

Replies are listed 'Best First'.
Re^3: Removing text between HTML tags
by choroba (Cardinal) on Sep 23, 2014 at 21:19 UTC
    I prefer XML::LibXML which can handle HTML as well. XML::Twig is also quite popular. They are both a bit higher level than XML::Parser.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^3: Removing text between HTML tags
by Laurent_R (Canon) on Sep 23, 2014 at 17:53 UTC
    That's the XML parser that I would have recommended for a start, but I do not use very much XML, and it is usually simple and well-formed XML, so that I don't need anything fancier and did not really try others.