in reply to Re^2: Hacker News titles using U+2013 EN DASH
in thread Hacker News titles using U+2013 EN DASH
This is a response to what you have here plus other posts throughout this thread.
Rather than a whole bank of individual s///g, each of which needs to be run for every string, I'd be more inclined to use a lookup table and a single s///g, which only needs to be run once for every string. Something along these lines:
$ perl -Mutf8 -C -E '
my %ent_for_char = (
"\x{2013}" => "–",
"\x{2018}" => "‘",
"\x{2019}" => "’",
"\x{201c}" => "“",
"\x{201d}" => "”",
);
my $test_str = "“fancy double” – ‘fancy single’ – fancy’apostrophe";
say $test_str;
$test_str =~ s/(.)/exists $ent_for_char{$1} ? $ent_for_char{$1} : $1/eg;
say $test_str;
'
“fancy double” – ‘fancy single’ – fancy’apostrophe
“fancy double” – ‘fancy single’ – fancy’apostrophe
You can modify the table (e.g. add "\x{2014}" => "—",) without requiring any changes to the code doing the processing.
— Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Hacker News titles using U+2013 EN DASH
by jdporter (Paladin) on Jan 11, 2024 at 20:19 UTC |