Well, to sum up all that stuff, it seems that PM was initially designed with html in mind then patched several times ending up to support latin-1 encoding on input/output but nothing else. Do I am right?

Well, PM was designed with pseudo-HTML in mind and still uses it (but not in node titles). As for the contents of node titles, I find more evidence that the originally design was not for them to be interpretted as HTML. I think that they were either designed to be text or that that part of the design just wasn't fully specified or fully considered. There were similar parts that should have been escaped and simply broke things in some cases so I don't think I'm stretching to guess that the titles were not escaped for similar reasons (a very common mistake that I've made many times and I've seen others make many times).

I suspect the storage of PM made with default table charsets (which is latin-1). Do I am right again?

No, the storage of PM nodes is encoding-agnostic, AFAICT. It just stores byte strings without bothering with encodings. And I'm glad.

BTW, if you look at your node's title, you'll notice that your accented characters are no longer correct. This is due to what I mentioned above; your browser is sending UTF-8 text to PerlMonks. Luckily, this prompted me to realize that there is a simple way that we can detect this. Now I just need to write conversion code (and I think a regex will be easier than porting Encode to PerlMonks, but we'll see).

In the mean time, if you are going to write French at PerlMonks, you'll need to use HTML entities for accented characters in the text and use a different browser to get accented characters in the titles (if this is a big hardship for you, maybe someone will volunteer to clean up your titles for you, though that work may have to be done every time you update a node).

- tye        


In reply to Re^5: Special & Accented chars in nodes titles ==> [à la française] (design) by tye
in thread Special & Accented chars in nodes titles ==> [à la française] by dfaure

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.