in reply to Re: Scraping Rendered Text that is not in Source Code
in thread Scraping Rendered Text that is not in Source Code

Thanks for the reply.

I've never done anything with XPaths before, but I did look into the documentation a little. At this point I was going on hour 7 of what I thought would be an easy script...

The while loop was copied straight off of a CPAN example somewhere, but I'll definitely keep that in mind for the future. About the error, I really don't know. I went through so many error messages today that they all just blurred together. I'll look at it again tomorrow when I get back to work. Thanks again.

  • Comment on Re^2: Scraping Rendered Text that is not in Source Code

Replies are listed 'Best First'.
Re^3: Scraping Rendered Text that is not in Source Code
by kcott (Archbishop) on Oct 31, 2010 at 10:39 UTC

    I had a look at the example page you gave. Both the HTML and Javascript are buggy. The HTML::TreeBuilder::XPath I mentioned won't be of any use in this situation. I was able to get to the city element with '//span[@id="city"]'. The id attributes are supposed to be unique so I'd recommend targetting them directly - that should hopefully get around issues with malformed markup. And it looks like I'm now starting to repeat what Corion already has below, so I'll shut up now. :-)

    -- Ken