HTTP response: 400 Bad Request

JoeJohnston has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have on very many occasions found help on from you Monks, but this is the first time I am stumped and can't find anything similar to help. I am trying to download files using File::Fetch. The program I was using had worked without issue several times in the past, including up to the day before yesterday. Now, when I try to download a file I get error messages like: Fetch failed! HTTP response: 400 Bad Request [400 Bad Request] at ...

I am wondering if it is my setup or possibly the web server (this is the SEC's XBRL RSS feed). I have tried this code on two different machines (home and work) and I still get the error most of the time (not all the time). The weird thing is that the file does indeed download (I have yet to check if the file is 100% accurate). This happens to both the RSS feeds as well as other documents from the same server and the links seem to work fine in a browser. File::Fetch seems to work OK with other sites, so I am guessing it's the SEC's server, but I'd like to be sure that is the case and, if possible, know why the bad response is being given. I was hoping someone could add light onto my situation. Here is the relevant code that produces the error.

use warnings;
use strict;
use File::Fetch;

unlink '/tmp/xbrltest/xbrlrss-2015-09.xml';
unlink '/tmp/xbrltest/xbrlrss-2015-10.xml';
unlink '/tmp/xbrltest/xbrlrss-2015-11.xml';

if (-e '/tmp/xbrltest/xbrlrss-2015-09.xml') {
    print "File 09 Exists\n";
} else {print "Need to Download 09\n";}

if (-e '/tmp/xbrltest/xbrlrss-2015-10.xml') {
    print "File 10 Exists\n";
} else {print "Need to Download 10\n";}

if (-e '/tmp/xbrltest/xbrlrss-2015-11.xml') {
    print "File 11 Exists\n";
} else {print "Need to Download 11\n";}





{
    print "Fetching File with URL in double quotes\n";
    my $url="http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2015-09
+.xml";
    my $ff  = File::Fetch->new(uri=>$url) or die("Something went wrong
+ in fetching file: $!");
    my $where = $ff->fetch(to=>'/tmp/xbrltest/') or die $ff->error;
}

{
    print "Fetching File with URL in single quotes\n";
    my $url   = 'http://www.sec.gov/Archives/edgar/monthly/xbrlrss-201
+5-10.xml';
    my $ff    = File::Fetch->new(uri=>$url) or die("Something went wro
+ng in Fetching RSS Feed: $!");
    my $where = $ff->fetch(to=>'/tmp/xbrltest/') or die $ff->error;
}

{
    print "Fetching File with URL built from parts\n";
    my $year  = '2015';
    my $month = '11';
    my $site = 'http://www.sec.gov/Archives/edgar/monthly/';
    my $rssFile = "xbrlrss-" . $year . "-" . $month . ".xml";

    my $url   = $site . $rssFile;
    my $ff    = File::Fetch->new(uri=>$url) or die("Something went wro
+ng in Fetching RSS Feed: $!");
    my $where = $ff->fetch(to => '/tmp/xbrltest/') or die $ff->error;
}


if (-e '/tmp/xbrltest/xbrlrss-2015-09.xml') {
    print "Download 09 Successful\n";
} else {print "Download 09 Failed\n";}

if (-e '/tmp/xbrltest/xbrlrss-2015-10.xml') {
    print "Download 10 Successful\n";
} else {print "Download 10 Failed\n";}

if (-e '/tmp/xbrltest/xbrlrss-2015-11.xml') {
    print "Download 11 Successful\n";
} else {print "Download 11 Failed\n";}


print "Done!\n"
[download]

Thanks,

Joe

Comment on HTTP response: 400 Bad Request Select or Download Code

Replies are listed 'Best First'.

Re: HTTP response: 400 Bad Request
by Athanasius (Archbishop) on Feb 01, 2016 at 03:41 UTC

Hello JoeJohnston,

I get similar results when I run your code. But note that the three calls to fetch do succeed — the die clauses are never entered — and the Fetch failed! ... message is actually just a warning generated during the fetch call. You can turn those warnings off as follows:

use File::Fetch;

$File::Fetch::WARN = 0;
[download]

Sorry, I can’t explain why the File::Fetch module thinks (incorrectly) that the fetch operation has failed. But I can confirm that the three files in question are downloaded correctly: in each case the file downloaded by fetch is identical to the corresponding file downloaded in my browser (Google Chrome).

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: HTTP response: 400 Bad Request

by Mr. Muskrat (Canon) on Feb 01, 2016 at 22:57 UTC

Sorry, I can’t explain why the File::Fetch module thinks (incorrectly) that the fetch operation has failed.

It's because _lwp_fetch (one of the http fetch methods) failed, presumably due to the User Agent. You can verify this by adding $File::Fetch::DEBUG = 1; before the calls to fetch.

JoeJohnston, you can replace $File::Fetch::WARN = 0; with $File::Fetch::BLACKLIST = ['lwp'];.

[reply]
[d/l]
[select]

Re^3: HTTP response: 400 Bad Request

by DennisJBell (Initiate) on Jul 22, 2017 at 19:46 UTC

After digging into this with tcpdump, the issue is as follows. The _lwp_fetch method produces a request header such as the one below:

    GET /path/to/file.ext HTTP/1.1
    TE: deflate,gzip;q=0.3
    Connection: TE, close
    Authorization: Basic YW5vbnltb3VzOkZpbGUtRmV0Y2hAZXhhbXBsZS5jb20=
    From: File-Fetch@example.com
    Host: my.domain.com
    If-Modified-Since: Wed, 07 Jun 2017 03:40:39 GMT
    User-Agent: File::Fetch/0.48
[download]

While some of those headers are questionable (ie the From) for an HTTP request, the one that causes this issue is the Authorization. Decoding that Basic Auth from Base64 reveals that is:

    $ decode_base64("YW5vbnltb3VzOkZpbGUtRmV0Y2hAZXhhbXBsZS5jb20=")
    anonymous:File-Fetch@example.com
[download]

Looking into the File::Fetch source, we find this:

    if ($self->userinfo) {
        $uri->userinfo($self->userinfo);
    } elsif ($self->scheme ne 'file') {
        $uri->userinfo("anonymous:$FROM_EMAIL");
    }
[download]

(lines 593-597 of File::Fetch)

When the server gets a basic auth when it wasn't expecting one, at least for a subset of servers, it returns a 400 Bad Request error. I'm not sure the intention of putting a default basic auth in the _lwp_fetch when none was specified and when no other request method does this, but this is the reason it doesn't work, and why you need to either blacklist lwp or turn off warnings. Hope this helps those who come along after.

[reply]
[d/l]
[select]

Re^2: HTTP response: 400 Bad Request

by JoeJohnston (Novice) on Feb 01, 2016 at 05:02 UTC

Thanks for confirming and for the info on turning off warnings.

Best,

Joe

[reply]

Re: HTTP response: 400 Bad Request
by Anonymous Monk on Feb 01, 2016 at 03:29 UTC

lots of stupid websies return 400 or 500 based silly headers they don't like like user agent, etc ... examine the headers

[reply]

Re^2: HTTP response: 400 Bad Request

by JoeJohnston (Novice) on Feb 01, 2016 at 05:04 UTC

Thanks for the your input. The odd thing was that if started all of a sudden. Maybe a change in the server configuration. As long as I get the files, I guess that's all that matters.

Best,

Joe

[reply]

Re^3: HTTP response: 400 Bad Request

by justin423 (Scribe) on Jul 14, 2023 at 01:56 UTC

The SEC does not allow botnets or automated tools to crawl the site. Any request that has been identified as part of a botnet or an automated tool outside of the acceptable policy will be managed to ensure fair access for all users. Please declare your user agent in request headers: Sample Declared Bot Request Headers: User-Agent: Sample Company Name AdminContact@<sample company domain>.com Accept-Encoding: gzip, deflate Host: www.sec.gov

[reply]

Re^4: HTTP response: 400 Bad Request

by Fletch (Bishop) on Jul 14, 2023 at 02:16 UTC

Re^4: HTTP response: 400 Bad Request

by bliako (Abbot) on Jul 14, 2023 at 07:29 UTC

Re^5: HTTP response: 400 Bad Request

by justin423 (Scribe) on Jul 14, 2023 at 09:17 UTC

Some notes below your chosen depth have not been shown here