Dear fellow monks!

I'm a bit lost with utf-8 conversion. For a FictionBook 2 eReader conversion script, I need to have "translations" for some UTF-8 characters to the appropriate eReader characters.

For this I used a part of the table found at eReader.com and stored it as a UTF8 file:

¡ ¡ ¡ \a161 Inverted exclamation ¢ ¢ ¢ \a162 Cent sign £ £ £ \a163 Pound sign : : skipped : œ œ œ \a156 Small combined oe Ÿ Ÿ Ÿ \a159 Large Y with diaeresis

Next I wanted to prepend the first character with it's UTF-8 unicode 4 digit code by using a oneliner (splitted here for better readability):

perl -i.bak -pe '\ binmode STDIN,":utf8"; \ binmode STDOUT,":utf8"; \ if (/^([^[:ascii:]])/) { \ $_= sprintf("%04x",ord $1).$_ \ }' pml.txt
Unfortunately I seem to miss something. I get data like this:
00c2¡ ¡ ¡ \a161 Inverted exclamation 00c2¢ ¢ ¢ \a162 Cent sign 00c2£ £ £ \a163 Pound sign
3 time 00c2 can't be true.

Do you see my mistake?


Update: Experimenting and reading perldoc perlrun, especially about -C led me to this version, which seems to work quite well:

perl -i.bar -CDS -pe ' \ if (/^([^[:ascii:]])/) { \ $_= sprintf("%04x",ord $1).$_ \ }' pml.txt

Update2: No... It still doesn't work


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

In reply to get UTF-8 character codes by Skeeve

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.