Re: Matching Ā & € type characters with a regex

It seems you are displaying UTF-8 as iso-latin-1 or similar. "Ā" is not a character, "Ā®" is the encoding of one character. Decode your encoded strings on input. Appropriately encode your decoded strings on output.

Comment on Re: Matching Ā & € type characters with a regex

Replies are listed 'Best First'.
Re^2: Matching Ā & € type characters with a regex by Rodster001 (Pilgrim) on Feb 12, 2009 at 18:39 UTC
Ok. Sooooo... what do I do? I can go though and find all the characters and do s/Ā//gsi for each character. Or, is there an easier way to match these types of characters?	[reply]
Re^3: Matching Ā & € type characters with a regex by moritz (Cardinal) on Feb 12, 2009 at 18:43 UTC
First you have to understand what character encodings are, and how they are handled in Perl. I've written this article to explain that, and there's also a lot of other useful information: perluniintro, Encode, perlunicode.	[reply]
Re^3: Matching Ā & € type characters with a regex by ikegami (Patriarch) on Feb 12, 2009 at 18:52 UTC
If you decode the input as I suggested, you won't have any "Ā" or even "Ā®", just the single character those bytes represent. There isn't anything to search and replace.	[reply]
Re^4: Matching Ā & € type characters with a regex by Rodster001 (Pilgrim) on Feb 12, 2009 at 18:57 UTC
This isn't the result of some processing I have done. This is what I have, a bunch of files that fell into my lap that already look like this. I have to clean them up.	[reply]
Re^5: Matching Ā & € type characters with a regex by ikegami (Patriarch) on Feb 12, 2009 at 19:52 UTC