BUG: code blocks don't retain literal formatting -- could they?

Edit:Just noticed that below my input window it said that I don't need to use HTML entities in code blocks and could use the literal characters. That's obviously not the case.

Could you change the code blocks to retain the user's original input? The worst offender is when one uses unicode. Example: If I include the greek pi (π) in regular text like this, it displays properly @ render time even though it was converted to an HTML entity (like &#960).

However, if it is a code block, something still modifies my formatting/input and changes my code to use an entity too, but then, to compound the problem, because it is a code block, it doesn't get reprocessed back into a UTF-8 character, but remains as an HTML entity, like:

This pi (&#960;) doesn't display correctly.
[download]

I would argue that because it is a code block, it should have >not< turned it into an HTML entity in the first place. Then it would display properly at page-render time. While it would be acceptable if it at least was 'round-trip safe' and displayed it correctly @ render time, it seems that it would be more correct or more 'ideal' to not touch the user's input in the 1st place in a code block.

Since it displays correctly in regular text, it should at least be possible to get it to display right in a code block, but no one would know that the author used the correct character to begin with.

Thanks...

Comment on BUG: code blocks don't retain literal formatting -- could they? Download Code

Replies are listed 'Best First'.
Re: BUG: code blocks don't retain literal formatting -- could they? by kcott (Archbishop) on Sep 15, 2016 at 07:38 UTC
G'day perl-diddler, Regarding what it says below the input window, maybe changing "... put the* characters ..."* to "... put these* characters ..."* would clarify which characters this statement references. The reason why things are the way they are, is to allow code like this: `sub amp { ... } my $coderef = \&` [download] Unfortunately, your request would render that as: `sub amp { ... } my $coderef = \&` [download] The workaround is to use 'pre' tags instead of 'code' tags for blocks (and 'tt' tags for inline text): This pi (π) using π does display correctly. This pi (π) using π does display correctly. This pi (π) using π does display correctly. This pi (π) using literal pi character does display correctly. [Note:* On previewing, I noticed that the literal pi character that I pasted into that last example now appears as π in the textarea.]* Use this workaround sparingly as you don't get a [download] link. Also, line wrapping (or absence thereof) can be problematic so aim to keep lines short (I think <= 72 characters is optimal, inasmuch as it doesn't mess up normal page layout). — Ken	[reply] [d/l] [select]
Re^2: BUG: code blocks don't retain literal formatting -- could they? by perl-diddler (Chaplain) on Sep 15, 2016 at 08:37 UTC
In regards to the rendering problem ... my 1st solution -- don't change user-input characters into HTML entities, wouldn't affect the ability to say 'amp' after a ampersand. Only if we went with the 2nd option of preserving roundtrip integrity. I preferred the 1st which had it not changing the user input, so it wouldn't need to change it the 2nd time, which caused the problem you mentioned. I used a literal pi, which was changed into the #960 form in the edit buffer, but then didn't change it back on display. FWIW, all of the different renderings of pi you tried display as pi on my system. Maybe it's a matter of browser configuration? I have my browser's fallback character encoding for 'legacy content'sic that fails to specify a character encoding to UTF-8. It rarely fails -- indicating that even new pages that fail to specify content are usually UTF-8. I'd say <10% actually use western as a default...	[reply]
Re^3: BUG: code blocks don't retain literal formatting -- could they? by kcott (Archbishop) on Sep 15, 2016 at 09:05 UTC
"FWIW, all of the different renderings of pi you tried display as pi on my system." That's good (and what I also get). All four This pi (π) ... does display correctly. lines are in a 'pre' block and were intended to contrast with your earlier `This pi (π) doesn't display correctly.` [download] in a 'code' block. — Ken	[reply] [d/l]
Re^3: BUG: code blocks don't retain literal formatting -- could they? by RonW (Parson) on Sep 15, 2016 at 23:19 UTC
Update: Corrected spelling and capitalization mistakes. As best I can tell, with out `use utf8;` in your Perl5 program, the Perl5 compiler expects the source code to be 8 bit ANSI characters.¹ With `use utf8;` in effect, you may have UTF8 encoded characters in your source code. Quoted strings, by default, are treated a streams of 8 bit bytes. With `use feature 'unicode_strings';` in effect, you can include UTF8 encoded characters in quoted strings. If PM could store the characters/bytes within code tags as-is, then only apply HTML encoding when generating HTML output, I think that would achieve the desired result. (the download link could supply the "raw" bytes with `Content-type: application/octet`) If that can't be done, maybe instead of HTML encoding, do `\x` encoding. Either way, non-7-bit-ANSI source code gets messed up, but at least double quoted strings might still be correctly interpreted by the Perl compiler.² --- ¹ I haven't tried using characters in the range 0x80 .. 0xFF in identifiers in Perl5, but Perl5 keywords all use characters < 0x80. ² The open question is, when `use feature 'unicode_strings';` is in effect, would `"\x80\x77"` be interpreted as 2 characters (`"\x80" "\x77"`) or 1 (`"\N{U+8077}"`) ?	[reply] [d/l] [select]
Re^4: BUG: code blocks don't retain literal formatting -- could they? by choroba (Cardinal) on Sep 16, 2016 at 07:55 UTC
Re^4: BUG: code blocks don't retain literal formatting -- could they? by perl-diddler (Chaplain) on Sep 16, 2016 at 08:15 UTC
Re^5: BUG: code blocks don't retain literal formatting -- could they? by RonW (Parson) on Sep 16, 2016 at 17:49 UTC
Some notes below your chosen depth have not been shown here
Re^3: BUG: code blocks don't retain literal formatting -- could they? by Anonymous Monk on Sep 16, 2016 at 01:38 UTC
In regards to the rendering problem ... my 1st solution -- don't change user-input characters into HTML entities, wouldn't affect the ability to say 'amp' after a ampersand. Thats done by your browser. See Re: Strange letters ...	[reply]
Re^2: BUG: code blocks don't retain literal formatting -- could they? by $h4X4_\|=73}{ (Monk) on Sep 17, 2016 at 12:05 UTC
~~The reason why things are the way they are, is to allow code like this:~~ ~~`sub amp { ... } my $coderef = \&`~~ ~~[download]~~ Unfortunately, your request would render that as: `sub amp { ... } my $coderef = \&` [download] This bug is caused by not encoding the semicolon and ampersand of the HTML entity. The encoding of the ampersand and semicolon must be done in the same code to not confuse from a past converted HTML entity and must be the first HTML filter. Any filter for HTML code that does not encode ampersand and semicolon will have this problem. This problem was addressed in a past version of my module AUBBC v4.01 - 11/08/2010 New version located at AUBBC2 The fix I use now looks like this. `s[(&\|;)][$1 eq '&' ? '&' : ';']gex;` [download] Update: spelling Update: My bad! Because PerlMonks mixes the HTML entity's with HTML names, will always cause a problem somewhere and no one filter will work in every case. You have to type the HTML name or your S.O.L.. Hay, welcome to PerlMonks. The place where you need to learn HTML before you can post your Perl question. ッ	[reply] [d/l] [select]
Re^3: BUG: code blocks don't retain literal formatting -- could they? by choroba (Cardinal) on Sep 17, 2016 at 19:55 UTC
> The place where you need to learn HTML before you can post your Perl question Yeah, because knowing HTML is something absolutely pointless, while the knowledge of Markdown, or at least one of its dialects used at StackOveflow, is something most employers need badly. ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]
Re^4: BUG: code blocks don't retain literal formatting -- could they? by LanX (Saint) on Sep 17, 2016 at 20:23 UTC
Re^4: BUG: code blocks don't retain literal formatting -- could they? by RonW (Parson) on Sep 20, 2016 at 19:25 UTC
Re^3: BUG: code blocks don't retain literal formatting -- could they? by Your Mother (Archbishop) on Sep 17, 2016 at 16:54 UTC
Why would mixing HTML named entities, hex, numeric, and whatever is a legal char in the document�s charset cause any problems?	[reply]
Re^3: BUG: code blocks don't retain literal formatting -- could they? by RonW (Parson) on Sep 20, 2016 at 19:28 UTC
Hay, welcome to PerlMonks. The place where you need to learn HTML before you can post your Perl question. PM is not the only (still existing) website that uses HTML for posting. See http://slashdot.org for example.	[reply]
Re: BUG: code blocks don't retain literal formatting -- could they? by choroba (Cardinal) on Sep 15, 2016 at 07:35 UTC
It's annoying, but I got used to it. When I need to post some Unicode (more often data than code), I use `<pre>` instead of `<code>` or `<c>`. لսႽ� ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re: BUG: code blocks don't retain literal formatting -- could they? by Anonymous Monk on Sep 15, 2016 at 07:39 UTC
Perlmonks uses the windows-1252 (similar to Latin-1) encoding, and all characters that are not in that character set are HTML-escaped - which doesn't work inside <code>...</code> tags, because everything is interpreted literally there.	[reply]
Re^2: BUG: code blocks don't retain literal formatting -- could they? by perl-diddler (Chaplain) on Sep 15, 2016 at 08:42 UTC
Yeah, but 1252 doesn't work for most characters -- especially in a unicode perl. As I mentioned, all of the different attempts to display pi by choroba that choroba said didn't work, displayed correctly as pi for me. But I have my fallback char encoding set to UTF-8, as it works more often (like here). Besides, how can one display pi in 1252? Pi doesn't occur in the 1252 charset AFAIK...	[reply]
Re^3: BUG: code blocks don't retain literal formatting -- could they? by kcott (Archbishop) on Sep 15, 2016 at 09:41 UTC
"As I mentioned, all of the different attempts to display pi by choroba that choroba said didn't work, displayed correctly as pi for me." I think there's two problems here. `:-)` I'm pretty sure you're referring to my post (with four attempts), not choroba's (with zero attempts). I did not say they "didn't work"; I said "does* display correctly"* against each attempt. — Ken	[reply]
Re^4: BUG: code blocks don't retain literal formatting -- could they? by perl-diddler (Chaplain) on Sep 15, 2016 at 18:18 UTC
Re^5: BUG: code blocks don't retain literal formatting -- could they? by kcott (Archbishop) on Sep 15, 2016 at 20:02 UTC
Some notes below your chosen depth have not been shown here
Re^3: BUG: code blocks don't retain literal formatting -- could they? by Anonymous Monk on Sep 15, 2016 at 08:58 UTC
Yeah, but ... Heheh, there is no buts :) `\N{U+03C0}`	[reply] [d/l]

Back to Perl Monks Discussion