Re^3: Matching/replacing a unicode character only works after decode()

It would only a backwards compatibility issue if you accept that UTF-8 is the ONLY encoding used in computing.

In the year 2014 UTF-8 is a more useful default than Latin-1, I'd say. BUT, the real problem is implicit upgrading from / downgrading to Latin-1. This is very similar to what Perl does with numbers / numeric-looking strings. The difference is not all strings look like numbers, but absolutely any binary string looks like Latin-1 (and some Unicode strings can be downgraded to Latin-1 without warnings).

Consider this:

perl -MDevel::Peek -wE 'my $r = qr/\x{03bc}/; Dump $r'
...
FLAGS = (OBJECT,FAKE,UTF8)
PV = 0x10eff20 "(?^u:\\x{03bc})" [UTF8 "(?^u:\\x{03bc})"]
[download]

Now, what happens when UTF-8 regex meets a binary string? My guess is that the string gets upgraded to (Perl's internal) UTF-8... FROM (what Perl thinks is) Latin-1, like it happens in other situations. Which is a wrong thing to do.

Ruby’s Unicode support was terrible

It's still terribad. But at least, Ruby default to UTF-8 in it's source, for example.

There is a big difference between excellent Unicode support (which Perl has, of course) and convenient Unicode support. You know, something that is not a pain in the ass. For example: what can go wrong with

open my $file, '<', '/bogus_file' or die "Can't open: $!\n";

Comment on Re^3: Matching/replacing a unicode character only works after decode() Select or Download Code