in reply to regexing for non-standard characters...
Assuming you properly decoded your input,
how does one find out what this stupid thing is
printf("chr(%d)\n", ord($ch)); # chr(8212) printf("chr(0x%04X)\n", ord($ch)); # chr(0x2014) printf("\"\\x{%04X}\"\n", ord($ch)); # "\x{2014}" printf("\"\\N{U+%04X}\"\n", ord($ch)); # "\N{U+2014}" use charnames (); printf("\"\\N{%s}\"\n", charnames::viacode(ord($ch))); # "\N{EM DASH}"
how to regex for it?
$word =~ /\x{2014}/ $word =~ /\N{U+2014}/ use charnames ':full'; $word =~ /\N{EM DASH}/ use utf8; $word =~ /—/ # Encoded as UTF-8 in the source
Update: Added crashtest's solution.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: regexing for non-standard characters...
by crashtest (Curate) on Apr 15, 2010 at 23:06 UTC |