nick2253 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to pull URLs off of a website, but accessing this website with WWW::Mechanize is extremely slow - on the order of 3-5 mins(!). However, with my browser, the page is accessed virtually instantaneously.

Once I get the page contents, the rest of my script (not reproduced) quickly executes, so I know the problem is somewhere in the get.

I've produced a sample code below that simply prints the response object (after the substantial delay):

use strict; use warnings; use WWW::Mechanize; my $request = WWW::Mechanize->new; $request->agent_alias('Windows Mozilla'); my $response = $request->get( 'https://ai.fmcsa.dot.gov/SMS/Tools/Down +loads.aspx' ); print "Got Website:\n\n$response\n";
As you can see from the code, I've tried different agent aliases, but from experimenting, that doesn't change anything.

Originally, I was thinking it was a problem with ASPX pages, but I can't find any pattern to the behavior. For example, https://www.google.com is quickly "got". However, https://demos.devexpress.com/ASPxNavigationAndLayoutDemos/TabControl/Templates.aspx, a random aspx site I found on Google is also accessed slowly. But then https://msdn.microsoft.com/en-us/library/2wawkw1c.aspx is accessed about as fast as google.com.

Is there any kind of verbose/debug output I can set on WWW::Mechanize to get a sense of what's causing it to be so slow? I'm guessing it's hanging on something, and only completing after timing out, but I have no idea what that could be, or how to fix it--if it's fixable--from my end.

Replies are listed 'Best First'.
Re: Webpage "get" slow with WWW::Mechanize
by noxxi (Pilgrim) on Nov 16, 2015 at 22:16 UTC

    I cannot reproduce your problem, i.e the test script is not slow. But since your script runs without problems (except slow) even though the certificate of this site is invalid (missing chain) I guess that you are using a fairly old version of WWW::Mechanize, LWP::UserAgent and IO::Socket::SSL which does not yet check the certificates properly.

    I thus recommend to first upgrade to new versions to see if the problem goes away. If not please add the full versions of the modules you've used.

      Updating IO::Socket::SSL did the trick. Sometimes it's the obvious things, right? :)

      I had already run in to the certificate issue and fixed that, but I forgot to mention it.

Re: Webpage "get" slow with WWW::Mechanize
by hippo (Archbishop) on Nov 16, 2015 at 20:56 UTC
    I'm guessing it's hanging on something, and only completing after timing out, but I have no idea what that could be, or how to fix it--if it's fixable--from my end.

    Sounds highly plausible. Have you tried reducing the timeout in your WWW::Mechanize object?

      Honestly, I'm not sure how. There's nothing obvious (to me) in the documentation to adjust it. However, I'm pretty infamous for overlooking really obvious things :/

        The page linked in my previous post says:

        WWW::Mechanize is a proper subclass of LWP::UserAgent and you can also use any of LWP::UserAgent's methods.

        So, you can use LWP::UserAgent's timeout method. eg:

        my $ua = WWW::Mechanize->new; $ua->timeout (5);
Re: Webpage "get" slow with WWW::Mechanize
by mr_ron (Deacon) on Nov 17, 2015 at 01:31 UTC

    The page doesn't come down slowly at all for me. I had some trouble with 500 can't connect at first and had to add a parameter to the Mechanize constructor.

    my $request = WWW::Mechanize->new(ssl_opts => { verify_hostname => 0 });

    It might be helpful for you try a TCP packet analyzer/sniffer to look at what packets or traffic might not be getting answered or otherwise getting held up or hanging.

    Ron