JoeJohnston has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have on very many occasions found help on from you Monks, but this is the first time I am stumped and can't find anything similar to help. I am trying to download files using File::Fetch. The program I was using had worked without issue several times in the past, including up to the day before yesterday. Now, when I try to download a file I get error messages like: Fetch failed! HTTP response: 400 Bad Request [400 Bad Request]  at ...

I am wondering if it is my setup or possibly the web server (this is the SEC's XBRL RSS feed). I have tried this code on two different machines (home and work) and I still get the error most of the time (not all the time). The weird thing is that the file does indeed download (I have yet to check if the file is 100% accurate). This happens to both the RSS feeds as well as other documents from the same server and the links seem to work fine in a browser. File::Fetch seems to work OK with other sites, so I am guessing it's the SEC's server, but I'd like to be sure that is the case and, if possible, know why the bad response is being given. I was hoping someone could add light onto my situation. Here is the relevant code that produces the error.

use warnings; use strict; use File::Fetch; unlink '/tmp/xbrltest/xbrlrss-2015-09.xml'; unlink '/tmp/xbrltest/xbrlrss-2015-10.xml'; unlink '/tmp/xbrltest/xbrlrss-2015-11.xml'; if (-e '/tmp/xbrltest/xbrlrss-2015-09.xml') { print "File 09 Exists\n"; } else {print "Need to Download 09\n";} if (-e '/tmp/xbrltest/xbrlrss-2015-10.xml') { print "File 10 Exists\n"; } else {print "Need to Download 10\n";} if (-e '/tmp/xbrltest/xbrlrss-2015-11.xml') { print "File 11 Exists\n"; } else {print "Need to Download 11\n";} { print "Fetching File with URL in double quotes\n"; my $url="http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2015-09 +.xml"; my $ff = File::Fetch->new(uri=>$url) or die("Something went wrong + in fetching file: $!"); my $where = $ff->fetch(to=>'/tmp/xbrltest/') or die $ff->error; } { print "Fetching File with URL in single quotes\n"; my $url = 'http://www.sec.gov/Archives/edgar/monthly/xbrlrss-201 +5-10.xml'; my $ff = File::Fetch->new(uri=>$url) or die("Something went wro +ng in Fetching RSS Feed: $!"); my $where = $ff->fetch(to=>'/tmp/xbrltest/') or die $ff->error; } { print "Fetching File with URL built from parts\n"; my $year = '2015'; my $month = '11'; my $site = 'http://www.sec.gov/Archives/edgar/monthly/'; my $rssFile = "xbrlrss-" . $year . "-" . $month . ".xml"; my $url = $site . $rssFile; my $ff = File::Fetch->new(uri=>$url) or die("Something went wro +ng in Fetching RSS Feed: $!"); my $where = $ff->fetch(to => '/tmp/xbrltest/') or die $ff->error; } if (-e '/tmp/xbrltest/xbrlrss-2015-09.xml') { print "Download 09 Successful\n"; } else {print "Download 09 Failed\n";} if (-e '/tmp/xbrltest/xbrlrss-2015-10.xml') { print "Download 10 Successful\n"; } else {print "Download 10 Failed\n";} if (-e '/tmp/xbrltest/xbrlrss-2015-11.xml') { print "Download 11 Successful\n"; } else {print "Download 11 Failed\n";} print "Done!\n"

Thanks,

Joe

Replies are listed 'Best First'.
Re: HTTP response: 400 Bad Request
by Athanasius (Archbishop) on Feb 01, 2016 at 03:41 UTC

    Hello JoeJohnston,

    I get similar results when I run your code. But note that the three calls to fetch do succeed — the die clauses are never entered — and the Fetch failed! ... message is actually just a warning generated during the fetch call. You can turn those warnings off as follows:

    use File::Fetch; $File::Fetch::WARN = 0;

    Sorry, I can’t explain why the File::Fetch module thinks (incorrectly) that the fetch operation has failed. But I can confirm that the three files in question are downloaded correctly: in each case the file downloaded by fetch is identical to the corresponding file downloaded in my browser (Google Chrome).

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Sorry, I can’t explain why the File::Fetch module thinks (incorrectly) that the fetch operation has failed.

      It's because _lwp_fetch (one of the http fetch methods) failed, presumably due to the User Agent. You can verify this by adding $File::Fetch::DEBUG = 1; before the calls to fetch.

      JoeJohnston, you can replace $File::Fetch::WARN = 0; with $File::Fetch::BLACKLIST = ['lwp'];.

        After digging into this with tcpdump, the issue is as follows. The _lwp_fetch method produces a request header such as the one below:

        GET /path/to/file.ext HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE, close Authorization: Basic YW5vbnltb3VzOkZpbGUtRmV0Y2hAZXhhbXBsZS5jb20= From: File-Fetch@example.com Host: my.domain.com If-Modified-Since: Wed, 07 Jun 2017 03:40:39 GMT User-Agent: File::Fetch/0.48

        While some of those headers are questionable (ie the From) for an HTTP request, the one that causes this issue is the Authorization. Decoding that Basic Auth from Base64 reveals that is:

        $ decode_base64("YW5vbnltb3VzOkZpbGUtRmV0Y2hAZXhhbXBsZS5jb20=") anonymous:File-Fetch@example.com

        Looking into the File::Fetch source, we find this:

        if ($self->userinfo) { $uri->userinfo($self->userinfo); } elsif ($self->scheme ne 'file') { $uri->userinfo("anonymous:$FROM_EMAIL"); }

        (lines 593-597 of File::Fetch)

        When the server gets a basic auth when it wasn't expecting one, at least for a subset of servers, it returns a 400 Bad Request error. I'm not sure the intention of putting a default basic auth in the _lwp_fetch when none was specified and when no other request method does this, but this is the reason it doesn't work, and why you need to either blacklist lwp or turn off warnings. Hope this helps those who come along after.

      Thanks for confirming and for the info on turning off warnings.

      Best,

      Joe

Re: HTTP response: 400 Bad Request
by Anonymous Monk on Feb 01, 2016 at 03:29 UTC
    lots of stupid websies return 400 or 500 based silly headers they don't like like user agent, etc ... examine the headers

      Thanks for the your input. The odd thing was that if started all of a sudden. Maybe a change in the server configuration. As long as I get the files, I guess that's all that matters.

      .

      Best,

      Joe

        Sorry for reviving a 5 year old thread, but I am getting a 400 bad request from the SEC site now. Testing downloading this file using file::Fetch and LWP https://www.sec.gov/Archives/edgar/daily-index/2023/QTR3/form.20230712.idx

        The SEC does not allow botnets or automated tools to crawl the site. Any request that has been identified as part of a botnet or an automated tool outside of the acceptable policy will be managed to ensure fair access for all users. Please declare your user agent in request headers: Sample Declared Bot Request Headers: User-Agent: Sample Company Name AdminContact@<sample company domain>.com Accept-Encoding: gzip, deflate Host: www.sec.gov