Not to repeat too much of the above, there are many answers for this question. One approach that will work if the page you are pulling from is reliabley valid HTML is to pull the page down and use
XML::Twig to parse the page or specific nodes within the using twig_roots. I just completed some work like this. The only snag will be if the page you pull is not valid HTML, the parser could choke, I solved a majority of these errors by running through Tidy.
Don
WHITEPAGES.COM | INC
Everything I've learned in life can be summed up in a small perl script!