in reply to A problem with dash typography

Hm, the correct numerical entity code for the em-dash is &#8212 ... otherwise —

It works for me with the correct character encoding in and out:

[12:04][nick:~/monks]$ perl -Mstrict -Mutf8 -MHTML::Entities -E ' binmode STDOUT,":utf8"; > say encode_entities("Chimney—sweeper"); > say encode_entities("Chimney–sweeper"); > say decode_entities("Chimney—sweeper"); > say decode_entities("Chimney–sweeper"); > ' Chimney—sweeper Chimney–sweeper Chimney—sweeper Chimney–sweeper
Hope this helps!

Edit: Decoded characters may not display properly here ...

The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^2: A problem with dash typography
by hsmyers (Canon) on Sep 09, 2015 at 15:03 UTC
    Sorry about the typos...that aside I believe you have nailed the necessary magic with the ':utf8'...excepting in this case it is required before I read the file. Will see what happens, thanks!

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."

      If you put:

      use utf8;
      at the top of the script, this tells Perl that your source code contains UTF8-encoded unicode characters.

      If you want to read and write UTF8, do this at the top of the script:

      binmode STDIN, ':utf8'; binmode STDOUT, ':utf8';
      Hope this helps!

      The way forward always starts with a minimal test.
        use utf8; is now in my new file template! And the change to STDIN is good, but what I need is a change to read IO in general...

        --hsm

        "Never try to teach a pig to sing...it wastes your time and it annoys the pig."