in reply to match substitution
I read somewhere that I can UTF-8 by specifying it at the beginning of the document use utf8;
use utf8; only specifies that the source is UTF-8. If you're reading data from a file, for example, you'll still need to decode that.
open(my $fh, '<:encoding(UTF-8)', $qfn) or die("Can't open file \"$qfn\": $!\n");
Don't forget to encode your output.
s/-/\x{2014}/g; This should turn a hyphen into an em dash correct?
Yes.
\x{2014} works even without use utf8;. It refers to character U+2014, no matter which encoding was used for the source.
The problem is, I only want to do the substitutions on the hyphens which are surrounded by 3 digits on both sides.
The approach you are taking require captures:
s/([0-9]{3})-([0-9]{3})/$1\x{2014}$2/g
But captures aren't needed here.
s/(?<=[0-9]{3})-(?=[0-9]{3})/\x{2014}/g
(\d matches some pretty funky stuff in addition to 0-9)
The latter snippet has the advantage of properly handling 123-456-789.
|
|---|