in reply to Re^2: Special character not being captured
in thread Special character not being captured

> when I go to get the first character (...) I suddenly need to specify the encoding

UTF-8 is a multi-byte encoding. It means that some characters, Æ being one of them, are encoded by more than one byte (in this case, two bytes: 0xC3 0x86). If a string starts with such a character, but Perl doesn't know the encoding, it assumes Latin-1, which is a single byte encoding. First character then corresponds to the first byte only, which is 0xC3. It doesn't have any meaning in UTF-8, so it's transformed into �, the replacement character.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^4: Special character not being captured
by Lady_Aleena (Priest) on Jun 23, 2019 at 17:47 UTC

    One last thing, I've been trying to figure out how to add utf8 to first_alpha, which I posted earlier. I am not having any success with it. So, how should I add it to that subroutine?

    No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
    Lady Aleena
      It doesn't belong there. You should always decode the input, as soon as possible; and similarly encode the output immediately before sending it out. first_alpha should receive an already decoded string.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]