Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^10: BUG: code blocks don't retain literal formatting -- could they?

by perl-diddler (Chaplain)
on Sep 20, 2016 at 20:04 UTC ( [id://1172243]=note: print w/replies, xml ) Need Help??


in reply to Re^9: BUG: code blocks don't retain literal formatting -- could they?
in thread BUG: code blocks don't retain literal formatting -- could they?

You'll forgive me for not taking someone else's word for it. That's not to say that you may know far more than I how difficult it is, but until I've looked at the issue and seen that it's not worth the effort, I am a dyed in the wool skeptic.

While the browser can likely convert html entities to binary-streams, I am pretty sure the opposite doesn't happen. Case in point -- here. Why would the browser, browsing a site that identifies itself as windows-1252 interpret user characters as Unicode and convert them into HTML-entities representing the unicode characters?

Second issue on that -- I've never seen any of my browsers do that on any other site. Though they can convert the entities into a binary stream. But again -- why would the browser convert the html entities into UTF-8 encoded Unicode if the website's encoding was directing conversion.

My claim is that for entities above the ASCII range, those entities will be converted into UTF-8 to be display in the browser. Case in point -- pi. It's character code is not in windows-1252. The browser converts the entity to UTF-8 -- not windows 1252, which is why I believe the fix is relatively trivial.

  • Comment on Re^10: BUG: code blocks don't retain literal formatting -- could they?

Replies are listed 'Best First'.
Re^11: BUG: code blocks don't retain literal formatting -- could they?
by RonW (Parson) on Sep 20, 2016 at 22:50 UTC
    until I've looked at the issue and seen that it's not worth the effort, I am a dyed in the wool skeptic.

    Understandable. I don't know how to get invited to pmdev, but maybe looking at the underlying engine will give you some insight. Do note that the engine is only a "foundation". A lot of the code that actually runs PM is contained in nodes (See Finding the code).

    Why would the browser, browsing a site that identifies itself as windows-1252 interpret user characters as Unicode and convert them into HTML-entities representing the unicode characters?

    If a character can't be represented in windows-1252 (or whatever character set the server says it's using), then an HTML entity is the representation called for in the W3 specifications. At the very least, the server can store the entity as part of the user supplied text.

    why would the browser convert the html entities into UTF-8 encoded Unicode if the website's encoding was directing conversion.

    The website isn't directing conversion. It's only telling the browser what it is sending. If the website tells the browser to expect windows-1252 characters, the browser will perform whatever conversion it needs to be able to display windows-1252 characters. If the server needs to send a character that isn't represented in windows-1252, it has to use an HTML-entity. It expects the browser to know what to do with the entity.

    If the server tells the browser to expect Unicode characters, then the only entities it would need to send would for those characters that are also part of HTML mark up (so the browser knows those aren't part of the HTML mark up).

    Side note: A large percentage of software changes I thought would be trivial, weren't. Most of this was because of new things that the original designers had not even dreamed of, let alone thought of. PM is very old. But keeps on working. If it were ever moved to a newer system, transferring the content might not be practical.

      Well, I'd love to try an experiment -- disabling the conversion of user-input characters in "Code" blocks on user-submission, into HTML entities. Seems like that shouldn't be "that" hard.

      BTW -- when I say website directing conversion -- I mean by claiming it is using a specific encoding. Certainly Win-1251 was never a default standard -- ISO-8859-1, maybe, but not Win-1251 -- so that has to be specified as an encoding by the website on each page. That's what I mean by "directing conversion". The fact that multiple people can read UTF-8 encoded Unicode characters (basically ignoring the website's encoding directive) leads me to believe that most browsers would automatically work -- and it is only the fact that PM, first converts user-input into html-entities that the problem exists -- because it is only in code-blocks that PM won't convert html-entities or inhibits browser conversion back into their character equivalents.

      That 2nd conversion makes UTF-8 displayable in normal text or "pre" blocks, but is disabled for "code" blocks. The best solution there would be to not mangle user-input in "code" blocks in the first place into html entities -- because they won't be converted back into displayable characters -- it's a one-way conversion that is causing the display bug -- so since "code" blocks aren't supposed to be formatted anyway -- it seems having the website reformat user input in those blocks is the root of the problem, since any conversion done in that 1st stage will be guaranteed to be one-way in code blocks.

        Well, I'd love to try an experiment -- disabling the conversion of user-input characters in "Code" blocks

        Supposedly, that conversion is happening in the browser (for the whole submission).

        It is possible to prove (or disprove) that by using WireShark (or similar) program to monitor what the browser is sending to PM.

        DANGER: The following idea might get you in trouble with the PM Gods

        Another possible experiment would be to manually create a form submission (see Form submission), including the proper values and attributes for Content-type and Content-Transfer-Encoding for the text area. Then use HTTP::Tiny to post it to PM.

        Important: If you do the above, be sure the value of the "op" field is "preview".

        Then you can examine the response content to see how badly PM choked on the submission.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1172243]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-20 12:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found