in reply to CPAN's URI.pm versus Japanese as Unicode?

I see two problems here: first, your source file is not declared as UTF-8 with use utf8;, which means that my $href="https://マリウス.com/"; is actually giving the string "https://\343\203\236\343\203\252\343\202\246\343\202\271.com/". Second, URI is encoding that with Punycode, which IMHO is one correct approach, as the URI documentation states that it works with URIs as per RFC 2396 and RFC 2732, which I think only support US-ASCII.

If you add the use utf8;, you get the output =xn--gckvb8fzb.com, which is the correct Punycode domain name of "マリウス.com" ("\x{30de}\x{30ea}\x{30a6}\x{30b9}.com").

What is unclear to me is what your goal is? Why do you (think you) need a URI object with unicode characters in it?

Replies are listed 'Best First'.
Re^2: CPAN's URI.pm versus Japanse as Unicode?
by mldvx4 (Hermit) on Dec 11, 2022 at 12:21 UTC

    Thanks, though adding use utf8 does not affect the result perhaps I need to convert from Punycode. Is there a module for converting from Punycode to Unicode? Working with the host names as Punycode is not really an option, as far as a I can tell, because the host name needs to remain human-readable.

    The goal is to extract the host name from the URI and the host name happens to be Japanese as Unicode, as is wont to happen.

      Thanks, though adding use utf8 does not affect the result

      Yes, it does.

      ... the host name needs to remain human-readable. The goal is to extract the host name from the URI and the host name happens to be Japanese as Unicode, ...

      Corion already pointed you to Net::IDN::Encode as one possibility.

      use warnings;
      use strict;
      use utf8;
      use open qw/:std :encoding(UTF-8)/;
      use URI;
      use Net::IDN::Encode qw/domain_to_unicode/;
      
      my $href="https://マリウス.com/";
      my $uri = URI->new($href);
      my $domain = domain_to_unicode($uri->host);
      print $domain,"\n";  # prints "マリウス.com"