in reply to Matching  & € type characters with a regex

It seems you are displaying UTF-8 as iso-latin-1 or similar. "Â" is not a character, "®" is the encoding of one character. Decode your encoded strings on input. Appropriately encode your decoded strings on output.

  • Comment on Re: Matching  & € type characters with a regex

Replies are listed 'Best First'.
Re^2: Matching  & € type characters with a regex
by Rodster001 (Pilgrim) on Feb 12, 2009 at 18:39 UTC
    Ok. Sooooo... what do I do? I can go though and find all the characters and do s/Â//gsi for each character. Or, is there an easier way to match these types of characters?
      First you have to understand what character encodings are, and how they are handled in Perl.

      I've written this article to explain that, and there's also a lot of other useful information: perluniintro, Encode, perlunicode.

      If you decode the input as I suggested, you won't have any "Â" or even "®", just the single character those bytes represent. There isn't anything to search and replace.
        This isn't the result of some processing I have done. This is what I have, a bunch of files that fell into my lap that already look like this. I have to clean them up.