in reply to Unique spidering need
WWW::Mechanise, although good, won't do want you want, I think - it doesn't deal with visual presentation of a page. I don't know whether XPCOM would; I guess what you want is some way to interface to the layout engine.
As a halfway house, and if you're on windows, you might try something like Win32::OLE to automate a instance of Internet Explorer.
use Win32::OLE; my $ie = Win32::OLE->new('InternetExplorer.Application') or die $@; $ie->{'visible'} = 1; $ie->navigate( "http://search.cpan.org/" ); # I think you need some kind of waiting loop here.... # You can access the DOM $ie->{'document'}->{'links'}->{'length'}; # JS : document.links.length
I picked up this from seeing it done in a similar way in Ruby; hopefully you could get hold of any of the document.height properties you wanted from IE. OK, you'll still have to do a bit of Javascript, but one way or another you're going to have to talk to a rendering engine, and it'll probably be more merciful going if it's in Perl and Javascript than Gecko bindings. But horses for courses..
cheers
ViceRaid
|
---|