Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^9: BUG: code blocks don't retain literal formatting -- could they?

by RonW (Parson)
on Sep 20, 2016 at 18:59 UTC ( [id://1172236]=note: print w/replies, xml ) Need Help??


in reply to Re^8: BUG: code blocks don't retain literal formatting -- could they?
in thread BUG: code blocks don't retain literal formatting -- could they?

Too bad no one is interested in fixing this. I guess they went AWOL

PM's code base dates back the 1990s. http://everything2.com/title/Everything+Engine

Granted, some of the issues with code tag processing could have been dealt with in the early days of PM, however, the limitations of the windows-1252 character set did not become a problem until years later.

Unfortunately, getting the PM website to handle Unicode/UTF8 is much more complicated than adding use feature 'unicode_strings'; statements to the code.

  • Comment on Re^9: BUG: code blocks don't retain literal formatting -- could they?
  • Download Code

Replies are listed 'Best First'.
Re^10: BUG: code blocks don't retain literal formatting -- could they?
by perl-diddler (Chaplain) on Sep 20, 2016 at 20:04 UTC
    You'll forgive me for not taking someone else's word for it. That's not to say that you may know far more than I how difficult it is, but until I've looked at the issue and seen that it's not worth the effort, I am a dyed in the wool skeptic.

    While the browser can likely convert html entities to binary-streams, I am pretty sure the opposite doesn't happen. Case in point -- here. Why would the browser, browsing a site that identifies itself as windows-1252 interpret user characters as Unicode and convert them into HTML-entities representing the unicode characters?

    Second issue on that -- I've never seen any of my browsers do that on any other site. Though they can convert the entities into a binary stream. But again -- why would the browser convert the html entities into UTF-8 encoded Unicode if the website's encoding was directing conversion.

    My claim is that for entities above the ASCII range, those entities will be converted into UTF-8 to be display in the browser. Case in point -- pi. It's character code is not in windows-1252. The browser converts the entity to UTF-8 -- not windows 1252, which is why I believe the fix is relatively trivial.

      until I've looked at the issue and seen that it's not worth the effort, I am a dyed in the wool skeptic.

      Understandable. I don't know how to get invited to pmdev, but maybe looking at the underlying engine will give you some insight. Do note that the engine is only a "foundation". A lot of the code that actually runs PM is contained in nodes (See Finding the code).

      Why would the browser, browsing a site that identifies itself as windows-1252 interpret user characters as Unicode and convert them into HTML-entities representing the unicode characters?

      If a character can't be represented in windows-1252 (or whatever character set the server says it's using), then an HTML entity is the representation called for in the W3 specifications. At the very least, the server can store the entity as part of the user supplied text.

      why would the browser convert the html entities into UTF-8 encoded Unicode if the website's encoding was directing conversion.

      The website isn't directing conversion. It's only telling the browser what it is sending. If the website tells the browser to expect windows-1252 characters, the browser will perform whatever conversion it needs to be able to display windows-1252 characters. If the server needs to send a character that isn't represented in windows-1252, it has to use an HTML-entity. It expects the browser to know what to do with the entity.

      If the server tells the browser to expect Unicode characters, then the only entities it would need to send would for those characters that are also part of HTML mark up (so the browser knows those aren't part of the HTML mark up).

      Side note: A large percentage of software changes I thought would be trivial, weren't. Most of this was because of new things that the original designers had not even dreamed of, let alone thought of. PM is very old. But keeps on working. If it were ever moved to a newer system, transferring the content might not be practical.

        Well, I'd love to try an experiment -- disabling the conversion of user-input characters in "Code" blocks on user-submission, into HTML entities. Seems like that shouldn't be "that" hard.

        BTW -- when I say website directing conversion -- I mean by claiming it is using a specific encoding. Certainly Win-1251 was never a default standard -- ISO-8859-1, maybe, but not Win-1251 -- so that has to be specified as an encoding by the website on each page. That's what I mean by "directing conversion". The fact that multiple people can read UTF-8 encoded Unicode characters (basically ignoring the website's encoding directive) leads me to believe that most browsers would automatically work -- and it is only the fact that PM, first converts user-input into html-entities that the problem exists -- because it is only in code-blocks that PM won't convert html-entities or inhibits browser conversion back into their character equivalents.

        That 2nd conversion makes UTF-8 displayable in normal text or "pre" blocks, but is disabled for "code" blocks. The best solution there would be to not mangle user-input in "code" blocks in the first place into html entities -- because they won't be converted back into displayable characters -- it's a one-way conversion that is causing the display bug -- so since "code" blocks aren't supposed to be formatted anyway -- it seems having the website reformat user input in those blocks is the root of the problem, since any conversion done in that 1st stage will be guaranteed to be one-way in code blocks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1172236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-20 05:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found