Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

How to download content generated by Javascript

by lihao (Monk)
on Mar 19, 2008 at 19:50 UTC ( [id://675083]=perlquestion: print w/replies, xml ) Need Help??

lihao has asked for the wisdom of the Perl Monks concerning the following question:

Hi, folks:

I am trying to parse some webpages which contain content generated by using Javascript function, for example, the source of HTML shows something like the following:(just a sample with jQuery as the JavaScript library)

<div id="content"></div> <script type="text/javascript"> var value = from_javascript_function(...); $(div#content).html(value); </script>

How can I grab the content displayed in <div>(id="content") on the web browser ?? Any Perl modules or some other tools... Many thanks for your suggestions

Regards,
lihao

Replies are listed 'Best First'.
Re: How to download content generated by Javascript
by runrig (Abbot) on Mar 19, 2008 at 22:01 UTC
    You can follow the advice above and use something that handles JavaScript, or you can install something like the Live HTTP headers plugin for Firefox, and examine for yourself what actually gets sent on HTTP requests, and then use something simple like LWP or WWW::Mechanize to fetch the content, which, if you can get it working this way, will run faster than using the JavaScript-enabled plugin methods. Though the JavaScript-enabled plugin method might be faster to develop.
Re: How to download content generated by Javascript
by pc88mxer (Vicar) on Mar 19, 2008 at 21:49 UTC
    Unfortunately, I believe perl is the wrong tool to use for web-scraping these days. It's fine for Web 1.0 applications, but for 2.0 apps you are much better off using the browser itself. You really need a complete Javascript/DOM environment to do it adequately.

    I'd investigate Firefox plugins like GreaseMonkey or Selenium. Selenium is controllable via perl, so there is still room for perl, but all the heavy lifting is going to be done by Firefox and the Selenium plugin.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://675083]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-25 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found