Parsing HTML using regular expressions is generally a very bad idea. You will always come across stuff that breaks your regular expressions eventually.
You are far better off using a real HTML parser. There is an HTML::Parser module on the CPAN and you'd be better off using that or one of its subclasses. It sound to me as if HTML::TreeBuilder might be just want you need in this instance.
--
"Perl makes the fun jobs fun
and the boring jobs bearable" - me
In reply to Re: Harvesting and Parsing HTML from other sites
by davorg
in thread Harvesting and Parsing HTML from other sites
by hostile17
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |