Re^3: BUG: code blocks don't retain literal formatting -- could they?

in reply to Re^2: BUG: code blocks don't retain literal formatting -- could they?
in thread BUG: code blocks don't retain literal formatting -- could they?

Update: Corrected spelling and capitalization mistakes.

As best I can tell, with out use utf8; in your Perl5 program, the Perl5 compiler expects the source code to be 8 bit ANSI characters.¹ With use utf8; in effect, you may have UTF8 encoded characters in your source code.

Quoted strings, by default, are treated a streams of 8 bit bytes. With use feature 'unicode_strings'; in effect, you can include UTF8 encoded characters in quoted strings.

If PM could store the characters/bytes within code tags as-is, then only apply HTML encoding when generating HTML output, I think that would achieve the desired result. (the download link could supply the "raw" bytes with Content-type: application/octet)

If that can't be done, maybe instead of HTML encoding, do \x encoding. Either way, non-7-bit-ANSI source code gets messed up, but at least double quoted strings might still be correctly interpreted by the Perl compiler.²

---

¹ I haven't tried using characters in the range 0x80 .. 0xFF in identifiers in Perl5, but Perl5 keywords all use characters < 0x80.

² The open question is, when use feature 'unicode_strings'; is in effect, would "\x80\x77" be interpreted as 2 characters ("\x80" "\x77") or 1 ("\N{U+8077}") ?

In Section Perl Monks Discussion