Re^3: Special & Accented chars in nodes titles ==> [à la française] (!ents)
by Sidhekin (Priest) on Jun 28, 2004 at 17:11 UTC
|
Do you need any non-Latin-1 characters for French?
Hopefully, non-Latin-1 chars are not required for French.
Well, there is Œ / œ ... but there does not seem to be a consensus on whether they are truely required for French. ;-)
print "Just another Perl ${\(trickster and hacker)},"
The Sidhekin proves Sidhe did it!
| [reply] [d/l] |
|
|
there is Œ / œ ... but there does not seem to be a consensus on whether they are truely required for French
That depends on what your requirement is. If typography is of a concern, then they are mandatory. œuf, cœur and œuvre spring to mind. This is actually a very good litmus test for see how your server and browser speak to each other. Sometimes you see little diamonds, sometimes nothing, sometimes an OE ligature. You can also use either ISO Latin-9, or œ if those alternatives are available.
There's also the AE ligature, that appears in both English and French, but fortunately that's part of ISO Latin-1. Unfortunately it's a rarer beast in French, and probably now considered archaic in English, apart from ægis and præternatual. Encyclopædia seems pretty archaic these days.
Then of course there is the problem of the correct use of space around French punctuation characters. Guillemets, question and exclamation marks, semi-colons and probably a few other glyphs should have a thin non-breaking space before them (or after them in the case of the left guillemet).
In the olde days this rendered with the   ISO entity ( but then your renderer needs to be programmed to deal with it ). Otherwise the modern alternatives appear to be the Unicode THIN SPACE (   ) or NARROW NO-BREAK SPACE (   ) entities.
Note that the three different entities have been used to add spaces inside the three parentheses in the above paragraph (but no spacing here). What you see is what your browser gives you.
Did I say typography is fun?
- another intruder with the mooring of the heat of the Perl
| [reply] |
|
|
| [reply] [d/l] |
Re^3: Special & Accented chars in nodes titles ==> [à la française] (!ents)
by tye (Sage) on Jun 28, 2004 at 18:11 UTC
|
nice to have PerlMonks/Everything encoding behaviour
This is specific to PerlMonks. I don't know what other Everything installations use these days, but PerlMonks used to interpret titles as HTML until I fixed it because it was causing problems and had the potential for even more abuses.
| [reply] |
|
|
Well, to sum up all that stuff, it seems that PM was initially designed with html in mind then patched several times ending up to support latin-1 encoding on input/output but nothing else. Do I am right?
I suspect the storage of PM made with default table charsets (which is latin-1). Do I am right again?
____
HTH, Dominique
My two favorites:
If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
Bien faire, et le faire savoir...
| [reply] |
|
|
Well, to sum up all that stuff, it seems that PM was initially designed with html in mind then patched several times ending up to support latin-1 encoding on input/output but nothing else. Do I am right?
Well, PM was designed with pseudo-HTML in mind and still uses it (but not in node titles). As for the contents of node titles, I find more evidence that the originally design was not for them to be interpretted as HTML. I think that they were either designed to be text or that that part of the design just wasn't fully specified or fully considered. There were similar parts that should have been escaped and simply broke things in some cases so I don't think I'm stretching to guess that the titles were not escaped for similar reasons (a very common mistake that I've made many times and I've seen others make many times).
I suspect the storage of PM made with default table charsets (which is latin-1). Do I am right again?
No, the storage of PM nodes is encoding-agnostic, AFAICT. It just stores byte strings without bothering with encodings. And I'm glad.
BTW, if you look at your node's title, you'll notice that your accented characters are no longer correct. This is due to what I mentioned above; your browser is sending UTF-8 text to PerlMonks. Luckily, this prompted me to realize that there is a simple way that we can detect this. Now I just need to write conversion code (and I think a regex will be easier than porting Encode to PerlMonks, but we'll see).
In the mean time, if you are going to write French at PerlMonks, you'll need to use HTML entities for accented characters in the text and use a different browser to get accented characters in the titles (if this is a big hardship for you, maybe someone will volunteer to clean up your titles for you, though that work may have to be done every time you update a node).
| [reply] |
|
|
|
|
|
|
|