Re^2: Save yourself, start all projets with UTF-8 encoding
by DrHyde (Prior) on Apr 07, 2011 at 10:31 UTC
|
Source code should be printable ASCII. Which is, of course, a subset of UTF-8.
It should be printable ASCII because not everyone who wants of needs to hack on it will have font support for weirdo characters. YOU might not have support for weirdo characters all the time - if, for example, it all goes wrong and you have to fix stuff using the crappy ssh client on your phone. If you need non-ASCII text in your application, then it should be corralled into language-specific ghettoes: templates or resource files.
In fact, even if your application only needs to work in ASCII-compatible languages (the only two I can think of are English and Latin) then ideally text strings will still live in templates and resource files, for two reasons. First, seperation of concerns. Second, it'll let you more easily add other languages later.
| [reply] |
|
|
even if your application only needs to work in ASCII-compatible languages (the only two I can think of are English and Latin)
Neither English nor Latin is ASCII-compatible. You cannot write English correctly in ASCII. Period! You don’t have proper quotes “like these” — nor distinct em dashes like those. You can’t write ± a few minutes, or 5¢, or ℅ General Delivery, let alone πr². You can’t even write 5÷2, or ©2011. What about 5 o’clock? And don’t get me started with things like jalapeños or résumés.
As for Latin, consider this bit of Latin nested in English:
Etymology: < Latin domināt‑ participial stem of dominārī to bear rule, govern, lord it, < domin‐us lord, master: compare French dominer.
See what I mean? The only English you can write in ASCII looks like bumpkin‐English, a word which I’ve self‐censored.
| [reply] |
|
|
| [reply] |
|
|
Neither English nor Latin *need* any character outside of ASCII. Your examples in English are examples of pretty typography, not of the language itself. "Jalapeño" and "résumé" are still correct if written as "jalapeno" and "resume" - in the latter case it is disambiguated from "resume" by context - or by writing "CV" instead. See also "cafe", which lacks the accent that it would have in French. "π" is, of course, not English, but Greek.
Latin inscriptions don't use accent marks, and I'm not aware of them existing in Latin handwriting either (such as in the Vindolanda tablets). I would, of course, be delighted to be corrected in this, with examples from before the fall of the Empire. I presume that the ones you cite are there to indicate pronunciation to a reader unfamiliar with Latin. I've seen similar marks in text books for teaching English as a foreign language.
| [reply] |
|
|
|
|
|
Re^2: Save yourself, start all projets with UTF-8 encoding
by Lady_Aleena (Priest) on Apr 05, 2011 at 03:58 UTC
|
JavaFan...I mean every file that can be encoded to UTF-8 encode to UTF-8. That way you don't have to rewrite everything because of issues with conversion. I have so many non-UTF-8 files that a manual by hand approach to conversion won't fly. However, because UTF-8 files don't talk to my non-UTF-8 files well, I have to do it all at once. I also can't figure out why UTF-8 scripts can't find files on my computer. If I had initially written my code and data files in UTF-8, I would have found these problems ages ago. However, since I didn't, I am just finding them now, when my code runs very deep. Finding where the problems are is nearly impossible. I have tried to trace the problems, and I can't find them.
Have a cookie and a very nice day!
Lady Aleena
| [reply] |