comment on

Esteemed Monks,

I wonder if it is expected behaviour from URI's url absolution (ok absolutisation) to still retain relative path components in final url. For example (using perl 5.22, URI v1.73, LWP::UserAgent v6.33) :

!/usr/bin/env perl

use strict;
use warnings;
use URI;

# a relative url:
my $rel_url = '../../../../../abc.html';
# the base url, where I stand now:
my @base_uris = ('http://server.com/123/xyz',
'http://server.com/1/2/3/4/5',
'http://server.com/1/2/3/4/5/');

# URI's absolute url:
foreach my $abase (@base_uris){
   my $uri = URI->new_abs( $rel_url, $abase );
   print "absolute for base: $abase is\n\t".$uri."\n";
}
[download]

yields:

absolute for base: http://server.com/123/xyz is
    http://server.com/../../../../abc.html
absolute for base: http://server.com/1/2/3/4/5 is
    http://server.com/../abc.html
absolute for base: http://server.com/1/2/3/4/5/ is
    http://server.com/abc.html
[download]

The last response is correct but I wonder whether for the first two cases URI should have used some heuristics to remove that '..'.

The reason is that recently I had a brief encounter with LWP::UserAgent (UA) and, subsequently, URI (described in detail here http://perlmonks.org/?node_id=1210570):

In summary, on receiving a "302 Found" server response, UA would by default follow the redirect by extracting the 'Location' item from the server's response headers. However, it was a twisted server. As a result it sent the 'Location' to follow as a relative url. Something similar to '$rel_url' in my example.

UA then proceeded to absolutise the received url (based on initial request url) to follow, using URI. Here is the relevant extract from LWP::UserAgent.pm (sub request())

# Some servers erroneously return a relative URL for redirects,
# so make it absolute if it not already is.
local $URI::ABS_ALLOW_RELATIVE_SCHEME = 1;
my $base = $response->base;
$referral_uri = "" unless defined $referral_uri;
$referral_uri
   = $HTTP::URI_CLASS->new($referral_uri, $base)->abs($base);
[download]

The result is YET ANOTHER relative url in $referral_uri which UA requests() and error ensues with a server 500 response (relative url forbidden). Which may puzzle someone and cause long debugging.

So, my questions/request are:

Should URI be using heuristics to absolutise urls?

If not, should LWP::UserAgent be using its own heuristics once URI returns a pseudo-absolute url? Or maybe LWP::UserAgent should die before making the follow-up request.

bliako

In reply to URI: making absolute urls by bliako

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.