bliako has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

I wonder if it is expected behaviour from URI's url absolution (ok absolutisation) to still retain relative path components in final url. For example (using perl 5.22, URI v1.73, LWP::UserAgent v6.33) :

!/usr/bin/env perl use strict; use warnings; use URI; # a relative url: my $rel_url = '../../../../../abc.html'; # the base url, where I stand now: my @base_uris = ('http://server.com/123/xyz', 'http://server.com/1/2/3/4/5', 'http://server.com/1/2/3/4/5/'); # URI's absolute url: foreach my $abase (@base_uris){ my $uri = URI->new_abs( $rel_url, $abase ); print "absolute for base: $abase is\n\t".$uri."\n"; }
yields:
absolute for base: http://server.com/123/xyz is http://server.com/../../../../abc.html absolute for base: http://server.com/1/2/3/4/5 is http://server.com/../abc.html absolute for base: http://server.com/1/2/3/4/5/ is http://server.com/abc.html

The last response is correct but I wonder whether for the first two cases URI should have used some heuristics to remove that '..'.

The reason is that recently I had a brief encounter with LWP::UserAgent (UA) and, subsequently, URI (described in detail here http://perlmonks.org/?node_id=1210570):

In summary, on receiving a "302 Found" server response, UA would by default follow the redirect by extracting the 'Location' item from the server's response headers. However, it was a twisted server. As a result it sent the 'Location' to follow as a relative url. Something similar to '$rel_url' in my example.

UA then proceeded to absolutise the received url (based on initial request url) to follow, using URI. Here is the relevant extract from LWP::UserAgent.pm (sub request())

# Some servers erroneously return a relative URL for redirects, # so make it absolute if it not already is. local $URI::ABS_ALLOW_RELATIVE_SCHEME = 1; my $base = $response->base; $referral_uri = "" unless defined $referral_uri; $referral_uri = $HTTP::URI_CLASS->new($referral_uri, $base)->abs($base);

The result is YET ANOTHER relative url in $referral_uri which UA requests() and error ensues with a server 500 response (relative url forbidden). Which may puzzle someone and cause long debugging.

So, my questions/request are:

Should URI be using heuristics to absolutise urls?

If not, should LWP::UserAgent be using its own heuristics once URI returns a pseudo-absolute url? Or maybe LWP::UserAgent should die before making the follow-up request.

bliako

Replies are listed 'Best First'.
Re: URI: making absolute urls
by choroba (Cardinal) on Mar 29, 2018 at 13:59 UTC
    Wow, has it really been 7 years? Relative URI
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      sorry I got a bit lost in the thread you posted: what's the verdict?
Re: URI: making absolute urls
by ikegami (Patriarch) on Mar 29, 2018 at 12:52 UTC

    Why don't you file a ticket about this violation of section 5.4.2 of RFC 3986?

      i just did thanks.