sugu has asked for the wisdom of the Perl Monks concerning the following question:

I'm using Lwp::useragent and Mechanize for getting source page of websites but for this website(https://camelcamelcamel.com/) when runs the code it shows 500 error. I can't able to figure out where is my mistake. I don't know whether my mistake in cookie or useragent, can someone help me with this using Lwp::useragent itself...Thank you in Advance.

use strict; use LWP::UserAgent; use HTTP::Cookies; my $url = "https://camelcamelcamel.com/"; my $ua=LWP::UserAgent->new(); $ua->agent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/2010010 +1 Firefox/46.0"); my $cookie = HTTP::Cookies->new(file=>$0."_cookie.txt",autosave=>1); $ua->cookie_jar($cookie); my $req = HTTP::Request->new(GET=>"$url"); $req->header("Content-Type"=> "application/x-www-form-urlencoded"); $req->header("Accept"=> "text/html,application/xhtml+xml,application/x +ml;q=0.8,*/*;q=0.7"); my $res = $ua->request($req); $cookie->extract_cookies($res); $cookie->save; $cookie->add_cookie_header($req); $res->header("Content-Type"=> "application/xml; charset=utf-8"); my $code=$res->code; print "Code::$code\n";

Replies are listed 'Best First'.
Re: Can't able to get source page 500 error.
by 1nickt (Canon) on Jun 08, 2017 at 12:53 UTC

    Hi, I don't know what your code is all about: you mention Mech but the code does not show it. I also don't know about your 500 error: that might be LWP not supporting 405 which is the code the site returns.

    In any case, that page requires JavaScript to be enabled. WWW::Mechanize does not support JavaScript. Perhaps you can get around it by filling the CAPTCHA returned?

    Using something more modern and simple to see what's going on:

    use strict; use warnings; use feature 'say'; use Path::Tiny; use HTTP::Tiny; use HTTP::CookieJar; my $jar_file = Path::Tiny->tempfile; $jar_file->touch; my $jar = HTTP::CookieJar->new->load_cookies( $jar_file->lines ); my $ua = HTTP::Tiny->new( cookie_jar => $jar ); my $url = 'https://camelcamelcamel.com'; my $res = $ua->get( $url ); say $res->{'status'}; say $res->{'content'}; __END__
    Output:
    405 [ snip ] <p> As you were browsing <strong>camelcamelcamel.com</ +strong> something about your browser made us think you were a bot. Th +ere are a few reasons this might happen: </p> <ul> <li>You're a power user moving through this websit +e with super-human speed.</li> <li>You've disabled JavaScript in your web browser +.</li> <li>A third-party browser plugin, such as Ghostery + or NoScript, is preventing JavaScript from running. Additional infor +mation is available in this <a title='Third party browser plugins tha +t block javascript' href='http://ds.tl/help-third-party-plugins' targ +et='_blank'>support article</a>.</li> </ul> <p>After completing the CAPTCHA below, you will immedi +ately regain access to camelcamelcamel.com.</p> [ snip ]

    Hope this helps!


    The way forward always starts with a minimal test.

      I saying that if I running this code it shows 500 error and i didn't use mechanize in this script but even i use mechanize also i can't get the source page.Someone try to get source page for this site i have know how to get source page of this site.<\p>

        Er, did you actually read my reply?

        1. The page requires Javascript.
        2. Neither LWP::UserAgent nor WWW::Mechanize support Javascript.
        3. Thus, your approach can not work.
        If you want to try to accomplish your task in Perl, you could look at Selenium::Remote::Driver, but that would require installing a headless browser on your system. Or, you could choose to accomplish your task without Perl using PhantomJS. Either way, you have a lot of work and learning ahead of you.


        The way forward always starts with a minimal test.
Re: Can't able to get source page 500 error.
by mrguy123 (Hermit) on Jun 08, 2017 at 12:44 UTC
    I think your main problem is that you are trying to fetch a HTTPS page.
    If you change the URL to http://www.imdb.com (for example) you get code=200.
    Look here for more info: http://www.perlmonks.org/?node_id=888422.

    Good luck!
    Mr Guy
Re: Can't able to get source page 500 error.
by amitsq (Beadle) on Jun 08, 2017 at 13:39 UTC
    my guess: try to install this modul here, so you can access secure webpages http://search.cpan.org/~gaas/LWP-Protocol-https-6.04/lib/LWP/Protocol/https.pm hope it helps