Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

web scraping with javascript support?

by gulden (Monk)
on Aug 03, 2010 at 16:34 UTC ( [id://852699]=perlquestion: print w/replies, xml ) Need Help??

gulden has asked for the wisdom of the Perl Monks concerning the following question:

I need to scrape webpages, where parts of the HTML page is obtained through the execution of Javascript function. For instance i need to scrape the inner table of this site http://tennis.7m.cn/default_en.aspx

There is a library to do this? How can I get a workaround around to do it, once I want to do this for several pages/sites?

WWW::Mechanize is not the solution: «That's because WWW::Mechanize doesn't operate on the JavaScript. It only understands the HTML parts of the page.»

$ wget also doesn't suppport it

Any workaround/solution?


«A contentious debate is always associated with a lack of valid arguments.»

Replies are listed 'Best First'.
Re: web scrapping with javascript support?
by runrig (Abbot) on Aug 03, 2010 at 16:42 UTC
Re: web scrapping with javascript support?
by mojotoad (Monsignor) on Aug 03, 2010 at 20:18 UTC
Re: web scrapping with javascript support?
by merlyn (Sage) on Aug 03, 2010 at 18:33 UTC
    I've often wanted to "scrap" the web. Start over. Try something new. :)

    I presume you mean "scrape", as in "web scraping".

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

      tks
Re: web scrapping with javascript support?
by superfrink (Curate) on Aug 03, 2010 at 20:01 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://852699]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-29 12:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found