in reply to How encode space into %20 in URI

Firstly, I can reproduce what you're seeing.

There have been a number of changes to the "URI distribution" recently. These include merging URI::QueryParam methods into URI as well as some changes to uri_escape() — see the Changes file.

The latest version of URI is 5.17; I've used this in the code below. I don't know if the '%20' vs. '+' issue has always existed, or if it's something that was inadvertently introduced with some change. It would make sense that the escaping mechanism was consistent across modules in the same distribution: I recommend that you raise a bug report.

The annotated code below shows a reproduction of '%20' vs. '+', two unsuccessful workaround attempts, and two successful workaround attempts.

#!/usr/bin/env perl use strict; use warnings; use URI 5.17; use URI::Escape 5.17; print "OS[$^O] perl[$^V]\n"; # uri_escape() gives %20 my $str = '2022-12-18 12:19:57'; print "\$str[$str]\n"; my $esc_str = uri_escape($str); print "\$esc_str[$esc_str]\n"; # query_param() gives + { my $u = URI->new('/api/Data', 'http'); print 'URI init=[', $u->as_string(), "]\n"; $u->query_param(t1 => '2022-12-18 12:19:57'); print 'URI raw_param=[', $u->as_string(), "]\n"; } # If escaped string is used, %20 -> %2520 { my $u = URI->new('/api/Data', 'http'); $u->query_param(t1 => $esc_str); print 'URI esc_param=[', $u->as_string(), "]\n"; } # If string escaped in situ, still %20 -> %2520 { my $u = URI->new('/api/Data', 'http'); $u->query_param(t1 => uri_escape('2022-12-18 12:19:57')); print 'URI in_situ_esc_param=[', $u->as_string(), "]\n"; } # You can modify the query string to change + to %20 { my $u = URI->new('/api/Data', 'http'); $u->query_param(t1 => '2022-12-18 12:19:57'); my $query = $u->query(); $query =~ s/\+/%20/g; $u->query($query); print 'URI long_re_sub_param=[', $u->as_string(), "]\n"; } # Perl 5.14 and later has the /r modifier: # use for a more succinct version of the above # with identical output. { my $u = URI->new('/api/Data', 'http'); $u->query_param(t1 => '2022-12-18 12:19:57'); $u->query($u->query() =~ s/\+/%20/gr); print 'URI rmod_re_sub_param=[', $u->as_string(), "]\n"; }

Output:

OS[cygwin] perl[v5.36.0] $str[2022-12-18 12:19:57] $esc_str[2022-12-18%2012%3A19%3A57] URI init=[/api/Data] URI raw_param=[/api/Data?t1=2022-12-18+12%3A19%3A57] URI esc_param=[/api/Data?t1=2022-12-18%252012%253A19%253A57] URI in_situ_esc_param=[/api/Data?t1=2022-12-18%252012%253A19%253A57] URI long_re_sub_param=[/api/Data?t1=2022-12-18%2012%3A19%3A57] URI rmod_re_sub_param=[/api/Data?t1=2022-12-18%2012%3A19%3A57]

— Ken

Replies are listed 'Best First'.
Re^2: How encode space into %20 in URI
by haukex (Archbishop) on Dec 22, 2022 at 09:45 UTC
    I don't know if the '%20' vs. '+' issue has always existed, or if it's something that was inadvertently introduced with some change.

    I personally wouldn't call it an "issue" because AFAIK* it's optional whether spaces should be escaped as %20 or +, and it's been that way for a long time. From URI's Changes file:

       2001-01-10   Gisle Aas <gisle@ActiveState.com>
     
       Release 1.10
     
       The $u->query_form method will now escape spaces in
       form keys or values as '+' (instead of '%20').  This also
       affect the $mailto_uri->header() method.  This is actually
       the wrong thing to do, but this practise is now even
       documented in official places like
       http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
       so we might as well follow the stream.
    
      "I personally wouldn't call it an "issue" because AFAIK it's optional whether spaces should be escaped as %20 or +, ..."

      If you look in the paragraph where I used the word "issue", you'll see:

      "It would make sense that the escaping mechanism was consistent across modules in the same distribution"

      I consider the inconsistency to be an issue. I made no reference to %20 or + being better, preferred, more correct, or anything else of that ilk.

      "... and it's been that way for a long time."

      Yes, I know. I was coding such escapes more than two decades ago.

      "From URI's Changes file:"

      I would question the relevance of including that entry from almost 22 years ago; it even references HTML4 which certainly isn't current. All that I'm getting from it is: "We changed %20 to + in some places, even though we believed that to be wrong, and left everything else in an inconsistent state".

      — Ken

        I consider the inconsistency to be an issue.

        I see your point, and it would be an option for URI to include a config option to encode spaces as %20 instead of + in the query. OTOH, I think the reason for the OP asking this question is strange, because it seems that either OP doesn't know that + is a valid encoding for space or there is some other misunderstanding of URIs, or perhaps they're building a workaround for a server that doesn't handle +, which would also be strange, and I might suggest (also) looking into the latter problem.

        I would question the relevance of including that entry from almost 22 years ago

        It was a reply to your "I don't know if the '%20' vs. '+' issue has always existed, or if it's something that was inadvertently introduced with some change." (Update: Emphasis mine.) A git blame shows the relevant lines of s/ /+/g; have been unchanged for 22 years.

        it even references HTML4 which certainly isn't current

        As I linked to in my other node, in this instance the HTML5 spec is backwards compatible with HTML4.