Esteemed Monks,

I wonder if it is expected behaviour from URI's url absolution (ok absolutisation) to still retain relative path components in final url. For example (using perl 5.22, URI v1.73, LWP::UserAgent v6.33) :

!/usr/bin/env perl use strict; use warnings; use URI; # a relative url: my $rel_url = '../../../../../abc.html'; # the base url, where I stand now: my @base_uris = ('http://server.com/123/xyz', 'http://server.com/1/2/3/4/5', 'http://server.com/1/2/3/4/5/'); # URI's absolute url: foreach my $abase (@base_uris){ my $uri = URI->new_abs( $rel_url, $abase ); print "absolute for base: $abase is\n\t".$uri."\n"; }
yields:
absolute for base: http://server.com/123/xyz is http://server.com/../../../../abc.html absolute for base: http://server.com/1/2/3/4/5 is http://server.com/../abc.html absolute for base: http://server.com/1/2/3/4/5/ is http://server.com/abc.html

The last response is correct but I wonder whether for the first two cases URI should have used some heuristics to remove that '..'.

The reason is that recently I had a brief encounter with LWP::UserAgent (UA) and, subsequently, URI (described in detail here http://perlmonks.org/?node_id=1210570):

In summary, on receiving a "302 Found" server response, UA would by default follow the redirect by extracting the 'Location' item from the server's response headers. However, it was a twisted server. As a result it sent the 'Location' to follow as a relative url. Something similar to '$rel_url' in my example.

UA then proceeded to absolutise the received url (based on initial request url) to follow, using URI. Here is the relevant extract from LWP::UserAgent.pm (sub request())

# Some servers erroneously return a relative URL for redirects, # so make it absolute if it not already is. local $URI::ABS_ALLOW_RELATIVE_SCHEME = 1; my $base = $response->base; $referral_uri = "" unless defined $referral_uri; $referral_uri = $HTTP::URI_CLASS->new($referral_uri, $base)->abs($base);

The result is YET ANOTHER relative url in $referral_uri which UA requests() and error ensues with a server 500 response (relative url forbidden). Which may puzzle someone and cause long debugging.

So, my questions/request are:

Should URI be using heuristics to absolutise urls?

If not, should LWP::UserAgent be using its own heuristics once URI returns a pseudo-absolute url? Or maybe LWP::UserAgent should die before making the follow-up request.

bliako


In reply to URI: making absolute urls by bliako

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.