Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Why this file fetch fails with WWW::Mechanize?

by ZJ.Mike.2009 (Scribe)
on Aug 04, 2010 at 09:58 UTC ( #852837=perlquestion: print w/replies, xml ) Need Help??

ZJ.Mike.2009 has asked for the wisdom of the Perl Monks concerning the following question:

The following script fails:
use WWW::Mechanize; use strict; use warnings; my $browser = WWW::Mechanize->new(); my $url = ' +167.217.206/%d3%e9%c0%d6%b0%d9%b7%d6%b0%d9-100803-%d0%a1%d6%ed%bd%dc% +c2%d7%b7%d6%d7%e9%d0%e3%c7%f2%bc%bc.mp4/segno=0%26&rid=A8F1F5DFEB1B11 +F1D90B40AD1BB75D69&filelength=21293994&blocksize=2097152&blocknum=11& +blockmd5=E210862B3F92935D0883E00AA2A38F08@D793599727C6DA4ACDB1CBF2235 +004AC@D5E9C9245C9A1BB63BC5EDA862A32604@51B5FDF91356B2B4E943EF72648EB0 +AD@6F2400488B04EBF66A60336B795EA142@8E51B8DCF87A7A02B84A2CAA5FFCA3CF@ +89080D683268481694DBA6D1E22A2EFF@8F56225C76854A434385A09C319BF9C3@9AB +0A3F199183F479F8887D1C3341B1B@845FE0D711086CC2D086546CD26B35C1@9D93A9 +BE1D2EDE216AA9EBF26BF414BE'; $browser->get($url);
While Win32::IE::Mechanize successfully fetched the file:
use Win32::IE::Mechanize; use strict; use warnings; my $browser = Win32::IE::Mechanize->new(visible=>0); my $url = ' +167.217.206/%d3%e9%c0%d6%b0%d9%b7%d6%b0%d9-100803-%d0%a1%d6%ed%bd%dc% +c2%d7%b7%d6%d7%e9%d0%e3%c7%f2%bc%bc.mp4/segno=0%26&rid=A8F1F5DFEB1B11 +F1D90B40AD1BB75D69&filelength=21293994&blocksize=2097152&blocknum=11& +blockmd5=E210862B3F92935D0883E00AA2A38F08@D793599727C6DA4ACDB1CBF2235 +004AC@D5E9C9245C9A1BB63BC5EDA862A32604@51B5FDF91356B2B4E943EF72648EB0 +AD@6F2400488B04EBF66A60336B795EA142@8E51B8DCF87A7A02B84A2CAA5FFCA3CF@ +89080D683268481694DBA6D1E22A2EFF@8F56225C76854A434385A09C319BF9C3@9AB +0A3F199183F479F8887D1C3341B1B@845FE0D711086CC2D086546CD26B35C1@9D93A9 +BE1D2EDE216AA9EBF26BF414BE'; $browser->get($url);

The file fetch also works as expected with IE and FF when I have disabled Javascript in them.

Any idea why WWW::Mechanize fails to fetch the file?

Thanks in advance :)

Replies are listed 'Best First'.
Re: Why this file fetch fails with WWW::Mechanize?
by Corion (Patriarch) on Aug 04, 2010 at 13:15 UTC

    Maybe the server sends something different when it does (not) detect Firefox or Internet Explorer. You don't tell us how it fails and also don't show us how you inspect the retrieved content. I recommend looking at the headers that go over the wire (using wireshark for example) and eliminating the differences one by one.

      @Corion, thanks for the suggestion. I've now recorded the headers sent to the server by Firefox using Live HTTP headers. They are something like:
      Host: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: +1) Gecko/20100701 Firefox/3.5.11 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: zh-cn,zh;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.0 200 OK Content-Type: video/x-flv Content-Length: 21293994 Connection: close
      As you suggested, I've also tried to send the same headers in Mechanize:
      use WWW::Mechanize; use strict; use warnings; my $browser = WWW::Mechanize->new(); $browser->cookie_jar(HTTP::Cookies->new()); $browser->add_header('User-Agent' => 'Mozilla/5.0 (Windows; U; Windows + NT 5.1; zh-CN; rv: Gecko/20100701 Firefox/3.5.11'); $browser->add_header('Accept' => 'text/xml,application/xml,application +/xhtml+xml;q=0.9,*/*;q=0.8'); $browser->add_header('Accept-Language' => 'zh-cn,zh;q=0.5'); $browser->add_header('Accept-Encoding' => 'gzip,deflate'); $browser->add_header('Accept-Charset' => 'GB2312,utf-8;q=0.7,*;q=0.7') +; $browser->add_header('Keep-Alive' => 300); $browser->add_header('Connection' => 'keep-alive'); my $url = ' +167.217.206/%d3%e9%c0%d6%b0%d9%b7%d6%b0%d9-100803-%d0%a1%d6%ed%bd%dc% +c2%d7%b7%d6%d7%e9%d0%e3%c7%f2%bc%bc.mp4/segno=0%26&rid=A8F1F5DFEB1B11 +F1D90B40AD1BB75D69&filelength=21293994&blocksize=2097152&blocknum=11& +blockmd5=E210862B3F92935D0883E00AA2A38F08@D793599727C6DA4ACDB1CBF2235 +004AC@D5E9C9245C9A1BB63BC5EDA862A32604@51B5FDF91356B2B4E943EF72648EB0 +AD@6F2400488B04EBF66A60336B795EA142@8E51B8DCF87A7A02B84A2CAA5FFCA3CF@ +89080D683268481694DBA6D1E22A2EFF@8F56225C76854A434385A09C319BF9C3@9AB +0A3F199183F479F8887D1C3341B1B@845FE0D711086CC2D086546CD26B35C1@9D93A9 +BE1D2EDE216AA9EBF26BF414BE'; $browser->get($url);
      But I'm receiving the same error as follows:
      Error GETing +9.167.217. 206/%d3%e9%c0%d6%b0%d9%b7%d6%b0%d9-100803-%d0%a1%d6%ed%bd%dc%c2%d7%b7% +d6%d7%e9%d 0%e3%c7%f2%bc%bc.mp4/segno=0%26&rid=A8F1F5DFEB1B11F1D90B40AD1BB75D69&f +ilelength= 21293994&blocksize=2097152&blocknum=11&blockmd5=E210862B3F92935D0883E0 +0AA2A38F08 @D793599727C6DA4ACDB1CBF2235004AC@D5E9C9245C9A1BB63BC5EDA862A32604@51B +5FDF91356B 2B4E943EF72648EB0AD@6F2400488B04EBF66A60336B795EA142@8E51B8DCF87A7A02B +84A2CAA5FF CA3CF@89080D683268481694DBA6D1E22A2EFF@8F56225C76854A434385A09C319BF9C +3@9AB0A3F1 99183F479F8887D1C3341B1B@845FE0D711086CC2D086546CD26B35C1@9D93A9BE1D2E +DE216AA9EB F26BF414BE: Internal Server Error at E:\ line 17
      Is there anything else I can try? Thanks :)
        Internal Server Error

        This means something goes wrong on the server side.

        Really use a network sniffer, to not only think you're sending the same data but to make sure you actually do send the same data.

        The server cannot decide that you are not using a browser unless you do something different from how a browser behaves. You just need to find out where the differences lies.

Re: Why this file fetch fails with WWW::Mechanize?
by talexb (Chancellor) on Aug 04, 2010 at 21:14 UTC

    If the file fetch works when Javascript is disabled, then it seems to me you have your answer -- fetch the file with Javascript disabled. (As well, Andy Lester explains in WWW::Mechanize's POD that his module doesn't support Javascript.)

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      @talexb, thanks for the input :)

      I know that Mechanize does not support Javascript. The thing is the file fetch can work with browser's Javascript either being disabled or enabled, and that makes me think Javascript hasn't played a part in my problem.

      If Mechanize does not act like a normal browser, which makes the file hosting server rejects its connection, adding headers should have solved the problem but it hasn't. This is something puzzling.
Re: Why this file fetch fails with WWW::Mechanize?
by Anonymous Monk on Aug 05, 2010 at 17:02 UTC
    Sorry but I tried Firefox and Chrome (Linux) and Firefox and IE (Windows) and got 500 errors every time.

      @Anonymous Monk, Thanks for the input.

      If I copy and paste the URL in the browser's address bar (FF3.5 or IE8) and then press the Enter key, a Save File dialog would pop up. But I've also noticed there's something tricky. For example, in the sniffer program URL Snooper, there's a Manually Scan a URL box, and if I copy and paste the URL into there and then press the Open in Browser button or the Download button, I would receive a 500 Internal Server Error as you described and refreshing would not solve the problem. Then if I redo the URL copying and pasting and then I press Enter, that Save File dialog would pop up again.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://852837]
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2023-01-30 12:00 GMT
Find Nodes?
    Voting Booth?

    No recent polls found