Since the codes for Latin-1 are the same as Unicode for the first 256 values, that should work (you need to re-encode the values but don't need to translate them though a table). That is, if "use utf8" is not in scope when the regex is compiled. I don't know about Perl 5.8, which reportedly doesn't need the utf8 pragma—you might need some other way to refer to those character on the input.

Anyway, you can use the same light-weight trick to convert back. s/([\x{80}-\x{ff}])/pack('C',$1)/eg Compiled with utf8 in effect (note the curlies on the \x codes. This indicates UTF-8 encoded characters). Then use pack instead of chr so you can specify bytes (chr does too much DWIMary and the persuasion thing is not as transparant as one would hope when dealing with I/O, though I think it's behavior in 5.6 would work in this case).

—John


In reply to Re: Re: Re: XML Simple Charset Q? by John M. Dlugosz
in thread XML Simple Charset Q? by dingus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.