Re^2: Scraping Rendered Text that is not in Source Code

Thanks for the reply.

I've never done anything with XPaths before, but I did look into the documentation a little. At this point I was going on hour 7 of what I thought would be an easy script...

The while loop was copied straight off of a CPAN example somewhere, but I'll definitely keep that in mind for the future. About the error, I really don't know. I went through so many error messages today that they all just blurred together. I'll look at it again tomorrow when I get back to work. Thanks again.

Comment on Re^2: Scraping Rendered Text that is not in Source Code

Replies are listed 'Best First'.
Re^3: Scraping Rendered Text that is not in Source Code by kcott (Archbishop) on Oct 31, 2010 at 10:39 UTC
I had a look at the example page you gave. Both the HTML and Javascript are buggy. The HTML::TreeBuilder::XPath I mentioned won't be of any use in this situation. I was able to get to the city element with `'//span[@id="city"]'`. The id attributes are supposed to be unique so I'd recommend targetting them directly - that should hopefully get around issues with malformed markup. And it looks like I'm now starting to repeat what Corion already has below, so I'll shut up now. :-) -- Ken	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: Scraping Rendered Text that is not in Source Code
by kcott (Archbishop) on Oct 31, 2010 at 10:39 UTC

I had a look at the example page you gave. Both the HTML and Javascript are buggy. The HTML::TreeBuilder::XPath I mentioned won't be of any use in this situation. I was able to get to the city element with '//span[@id="city"]'. The id attributes are supposed to be unique so I'd recommend targetting them directly - that should hopefully get around issues with malformed markup. And it looks like I'm now starting to repeat what Corion already has below, so I'll shut up now. :-)

-- Ken

[reply]
[d/l]