Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

"406 not acceptable" errors with LWP::UserAgent::get

by jimhenry (Acolyte)
on Aug 17, 2020 at 23:31 UTC ( [id://11120857]=perlquestion: print w/replies, xml ) Need Help??

jimhenry has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a podcatcher and I ran into a problem where one particular podcast (whose RSS feed downloads fine in Firefox or wget) was failing to download when I used the default HTTP headers I used with all the other podcasts I've tested with. It would give a "406 Not Aceptable" error, which research suggested was caused by bad Accept headers. I ran wget -d to see what headers it was using and copied them into my script (or rather the config file for my script), and got the same error with this podcast. Below, find a simplified version of the code that reproduces the problem.

#! /usr/bin/perl use strict; use warnings; use LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->agent('Mozilla/5.0'); $ua->show_progress( 1 ); my %headers = ( 'Accept' => '*/*', 'Accept-Encoding' => 'identity', 'Connection' => 'Keep-Alive', # 'Host' => 'feed.podbean.com' ); 'Host' => 'makeoursmarvel.com' ); #my $url = 'https://feed.podbean.com/betheserpent/feed.xml'; my $url = 'http://makeoursmarvel.com/feed/podcast/'; my $response = $ua->get( $url, %headers ); if ( $response->is_success ) { print "Okay!\n"; } else { my $status = $response->status_line; print "failed to download $url: $status\n"; }

If I use the commented out lines instead for the Host and $url value (checking a different podcast's RSS feed), everything works fine. I also tried using the default Firefox Accept: header, based on https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation/List_of_default_Accept_values, and got the same 406 error. The same podcast also give me a 406 error when I try to get an individual mp3 file.

Any ideas how to narrow the problem down further, if not fix it?

I'm using Perl v5.26.1 and LWP::UserAgent version 6.31.

Replies are listed 'Best First'.
Re: "406 not acceptable" errors with LWP::UserAgent::get
by Your Mother (Archbishop) on Aug 17, 2020 at 23:44 UTC

    Seems the main thing it doesn’t like is the agent. Your code gave me the same error, even tweaked and pruned down until I swapped the agent for something that isn’t, apparently, blacklisted.

    use 5.10.0; use LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->agent('DomoArigato/3.0'); my $url = 'http://makeoursmarvel.com/feed/podcast/'; my $response = $ua->get($url); say $response->is_success ? "OK!" : $response->as_string; __END__ OK!

      Usually to get around these kinds of blocks the easiest way is to set the user agent string to be the same one your browser uses

      -Thomas
      "Excuse me for butting in, but I'm interrupt-driven..."

        Disagree in the main. :P People blacklist agents, not whitelist, and a one-off for an agent that is NEVER abusive to a service is less likely to get crapcanned than one that shares a (base) name with 30% of the traffic.

        Thanks for the replies. I tried setting the agent to the user-agent string used by my current version of Firefox and had no trouble. (I'm still not sure why I was getting a 406 error for this podcast instead of a 403 Forbidden error, which I was getting on a number of podcasts with the default agent of "libwww-perl".)
Re: "406 not acceptable" errors with LWP::UserAgent::get
by tobyink (Canon) on Aug 19, 2020 at 11:42 UTC

    As an aside, there's no need to set a Host header. Yes, it's required by HTTP (unless you're still using HTTP 1.0), but LWP::UserAgent will do that for you. As would HTTP::Tiny or any other HTTP client library worth its salt.

    The only benefit you get from specifying it manually is the wonderful opportunity to screw things up.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11120857]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-19 10:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found