C:\>chcp 1252 Active code page: 1252 C:\>type Windows-1252.txt Everyone seems to have lept to the assumption that your "text file with some weird characters" in it—an em dash, which is not so weird really—is in the Unicode coded character set. It may be, or it may be in the Windows-1252 character encoding. The former is a multi-byte encoding and the latter is a single-byte encoding. The difference is fundamental. So, first, you need to know whether your text file is in some encoding form of Unicode (e.g., UTF-8) or in the Windows-1252 character encoding—or even possibly in some other legacy encoding. C:\>perl -ne "print if m/—/" Windows-1252.txt with some weird characters" in it—an em dash, which is not so weird really—is in the Unicode coded character set. It may be, or it may be character encoding—or even possibly in some other legacy encoding. C:\>perl -ne "print if m/\x97/" Windows-1252.txt with some weird characters" in it—an em dash, which is not so weird really—is in the Unicode coded character set. It may be, or it may be character encoding—or even possibly in some other legacy encoding. C:\>perl -ne "print if /\x{2014}/" Windows-1252.txt C:\>perl -ne "print if /\N{U+2014}/" Windows-1252.txt C:\>perl -mcharnames=:full -ne "print if /\N{EM DASH}/" Windows-1252.txt C:\>perl -ne "print if m/—/" UTF-8.txt C:\>perl -ne "print if m/\x97/" UTF-8.txt C:\>chcp 65001 Active code page: 65001 C:\>type UTF-8.txt Everyone seems to have lept to the assumption that your "text file with some weird characters" in it—an em dash, which is not so weird really—is in the Unicode coded character set. It may be, or it may be in the Windows-1252 character encoding. The former is a multi-byte encoding and the latter is a single-byte encoding. The difference is fundamental. So, first, you need to know whether your text file is in some encoding form of Unicode (e.g., UTF-8) or in the Windows-1252 character encoding—or even possibly in some other legacy encoding. C:\>perl -CiO -ne "print if m/—/" UTF-8.txt C:\>perl -CiO -ne "print if m/\x97/" UTF-8.txt C:\>perl -CiO -ne "use utf8; print if m/—/" UTF-8.txt Malformed UTF-8 character (unexpected continuation byte 0x97, with no preceding start byte) at -e line 1. C:\>perl -CiO -ne "print if m/\x{2014}/" UTF-8.txt with some weird characters" in it—an em dash, which is not so weird really—is in the Unicode coded character set. It may be, or it may be character encoding—or even possibly in some other legacy encoding. C:\>perl -CiO -ne "print if m/\N{U+2014}/" UTF-8.txt C:\>perl -mcharnames=:full -CiO -ne "print if m/\N{EM DASH}/" UTF-8.txt with some weird characters" in it—an em dash, which is not so weird really—is in the Unicode coded character set. It may be, or it may be character encoding—or even possibly in some other legacy encoding. C:\>