advait has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
I am trying to write a simple script to get webpage using LWP::simple. the contents received show that
<noscript> <div class="ae_error_area">ArrayExpress uses JavaScript fo +r better data handling and enhanced representation. Please enable Jav +aScript if you want to continue browsing ArrayExpress.</div> </noscript>

my script is
use LWP::Simple; use HTTP::Cookies; use LWP::UserAgent; use HTTP::Request::Common qw(GET); my $ua = LWP::UserAgent->new; my $url= 'http://www.ebi.ac.uk/arrayexpress/q-aer/leukemia%20Homo%2 +0sapiens'; # Define user agent type $ua->agent('Mozilla/8.0'); # Cookies $ua->cookie_jar( HTTP::Cookies->new( file => 'mycookies.txt', autosave => 1 ) ); my $content = get $url; die "Couldn't get $url" unless defined $content; print $content
Can you please suggest how can I over come this problem
Thank you

Replies are listed 'Best First'.
Re: How to enable java script in automated webpage retrieval
by moritz (Cardinal) on Jul 03, 2008 at 17:40 UTC
    Perl doesn't have a javascript interpreter built in, so you can't just "enable" it. You can search on cpan if there's a javascript interpreter that you can use from perl.

    Or you can use something like Selenium (with WWW::Selenium presumably) to automate a browser that has javascript built in.

    Btw please don't write "Javascript" as "Java script" - Java is a totally unrelated programming language.

Re: How to enable java script in automated webpage retrieval
by pjotrik (Friar) on Jul 03, 2008 at 18:30 UTC

    Put more effort into faking the User-Agent, e.g. $ua->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15'); works fine. Authors of the page obviously try to be too clever and decide what version of the page you get according to the user-agent header.

    That seems kinda weird, when they only serve a <noscript> which was designed to allow you to present both script-enabled and simple version of the document at once...