Re: Save yourself, start all projets with UTF-8 encoding

Replies are listed 'Best First'.
Re^2: Save yourself, start all projets with UTF-8 encoding by DrHyde (Prior) on Apr 07, 2011 at 10:31 UTC
Source code should be printable ASCII. Which is, of course, a subset of UTF-8. It should be printable ASCII because not everyone who wants of needs to hack on it will have font support for weirdo characters. YOU might not have support for weirdo characters all the time - if, for example, it all goes wrong and you have to fix stuff using the crappy ssh client on your phone. If you need non-ASCII text in your application, then it should be corralled into language-specific ghettoes: templates or resource files. In fact, even if your application only needs to work in ASCII-compatible languages (the only two I can think of are English and Latin) then ideally text strings will still live in templates and resource files, for two reasons. First, seperation of concerns. Second, it'll let you more easily add other languages later.	[reply]
Re^3: Save yourself, start all projets with UTF-8 encoding by tchrist (Pilgrim) on Apr 10, 2011 at 03:11 UTC
even if your application only needs to work in ASCII-compatible languages (the only two I can think of are English and Latin) Neither English nor Latin is ASCII-compatible. You cannot write English correctly in ASCII. Period! You don’t have proper quotes “like these” — nor distinct em dashes like those. You can’t write ± a few minutes, or 5¢, or ℅ General Delivery, let alone πr². You can’t even write 5÷2, or ©2011. What about 5 o’clock? And don’t get me started with things like jalapeños or résumés. As for Latin, consider this bit of Latin nested in English: Etymology: < Latin domināt‑ participial stem of dominārī to bear rule, govern, lord it, < domin‐us lord, master: compare French dominer. See what I mean? The only English you can write in ASCII looks like bumpkin‐English, a word which I’ve self‐censored.	[reply]
Re^4: Save yourself, start all projets with UTF-8 encoding by educated_foo (Vicar) on Apr 10, 2011 at 04:23 UTC
The only English you can write in ASCII looks like bumpkin‐English... Verily! True Gentleman's English requires effort and research. One cannot simply stab away at a key-board, relying upon the inscribed symbols to guide one's typing. Nay, one must consult tables and input numeric codes. (Quick quiz: How long does it take you to type the Unicode ligatures for "ff," "fi," and "ffi?" Doesn't doing so make you feel like much less of a bumpkin?)	[reply]
Re^4: Save yourself, start all projets with UTF-8 encoding by DrHyde (Prior) on Apr 11, 2011 at 09:45 UTC
Neither English nor Latin need any character outside of ASCII. Your examples in English are examples of pretty typography, not of the language itself. "Jalapeño" and "résumé" are still correct if written as "jalapeno" and "resume" - in the latter case it is disambiguated from "resume" by context - or by writing "CV" instead. See also "cafe", which lacks the accent that it would have in French. "π" is, of course, not English, but Greek. Latin inscriptions don't use accent marks, and I'm not aware of them existing in Latin handwriting either (such as in the Vindolanda tablets). I would, of course, be delighted to be corrected in this, with examples from before the fall of the Empire. I presume that the ones you cite are there to indicate pronunciation to a reader unfamiliar with Latin. I've seen similar marks in text books for teaching English as a foreign language.	[reply]
Re^5: Save yourself, start all projets with UTF-8 encoding by jdporter (Paladin) on Apr 11, 2011 at 14:32 UTC
Re^6: Save yourself, start all projets with UTF-8 encoding by DrHyde (Prior) on Apr 12, 2011 at 10:28 UTC
Some notes below your chosen depth have not been shown here
Re^2: Save yourself, start all projets with UTF-8 encoding by Lady_Aleena (Priest) on Apr 05, 2011 at 03:58 UTC
JavaFan...I mean every file that can be encoded to UTF-8 encode to UTF-8. That way you don't have to rewrite everything because of issues with conversion. I have so many non-UTF-8 files that a manual by hand approach to conversion won't fly. However, because UTF-8 files don't talk to my non-UTF-8 files well, I have to do it all at once. I also can't figure out why UTF-8 scripts can't find files on my computer. If I had initially written my code and data files in UTF-8, I would have found these problems ages ago. However, since I didn't, I am just finding them now, when my code runs very deep. Finding where the problems are is nearly impossible. I have tried to trace the problems, and I can't find them. *Have a cookie and a very nice day!* Lady Aleena	[reply]