in reply to Regular Expressions on Unicode
The first line takes care of making sure that all files in @ARGV get opened with the intended encoding layer, and the second line covers STDIN. (I also typically include , OUT => ':utf8' on the first line, and add a third line for STDOUT.)use open IN => ':utf8'; binmode STDIN, ':utf8';
The difference between ":encoding(utf8)" and just plain ":utf8" is, I think, simply a matter of how much you want to trust your input. If there are encoding errors (sequences of non-ASCII bytes that do not form valid utf8 characters), the simpler form will just cause the program to die with an error message, whereas ":encoding(utf8)" will give a detailed warning message, supply a replacement string that makes the problem easy to spot, and keep running.
(updated code snippet to normalize quotes)
|
|---|