in reply to Re^2: Save yourself, start all projets with UTF-8 encoding
in thread Save yourself, start all projets with UTF-8 encoding

even if your application only needs to work in ASCII-compatible languages (the only two I can think of are English and Latin)
Neither English nor Latin is ASCII-compatible. You cannot write English correctly in ASCII. Period!

You don’t have proper quotes “like these” — nor distinct em dashes like those. You can’t write ± a few minutes, or 5¢, or ℅ General Delivery, let alone πr². You can’t even write 5÷2, or ©2011. What about 5 o’clock? And don’t get me started with things like jalapeños or résumés.

As for Latin, consider this bit of Latin nested in English:

Etymology: < Latin domināt‑ participial stem of dominārī to bear rule, govern, lord it, < domin‐us lord, master: compare French dominer.

See what I mean? The only English you can write in ASCII looks like bumpkin‐English, a word which I’ve self‐censored.

  • Comment on Re^3: Save yourself, start all projets with UTF-8 encoding

Replies are listed 'Best First'.
Re^4: Save yourself, start all projets with UTF-8 encoding
by educated_foo (Vicar) on Apr 10, 2011 at 04:23 UTC
    The only English you can write in ASCII looks like bumpkin‐English...
    Verily! True Gentleman's English requires effort and research. One cannot simply stab away at a key-board, relying upon the inscribed symbols to guide one's typing. Nay, one must consult tables and input numeric codes.

    (Quick quiz: How long does it take you to type the Unicode ligatures for "ff," "fi," and "ffi?" Doesn't doing so make you feel like much less of a bumpkin?)

Re^4: Save yourself, start all projets with UTF-8 encoding
by DrHyde (Prior) on Apr 11, 2011 at 09:45 UTC

    Neither English nor Latin *need* any character outside of ASCII. Your examples in English are examples of pretty typography, not of the language itself. "Jalapeño" and "résumé" are still correct if written as "jalapeno" and "resume" - in the latter case it is disambiguated from "resume" by context - or by writing "CV" instead. See also "cafe", which lacks the accent that it would have in French. "π" is, of course, not English, but Greek.

    Latin inscriptions don't use accent marks, and I'm not aware of them existing in Latin handwriting either (such as in the Vindolanda tablets). I would, of course, be delighted to be corrected in this, with examples from before the fall of the Empire. I presume that the ones you cite are there to indicate pronunciation to a reader unfamiliar with Latin. I've seen similar marks in text books for teaching English as a foreign language.

      Neither English nor Latin *need* any character outside of ASCII.
      ...
      "π" is, of course, not English, but Greek.

      It's true that English does not need such special characters. By that line of reasoning, we may as well say that English doesn't need any characters at all, because it is a spoken language. Now, if you will object and say that English is also a written language, then you must allow that English as a written language needs whatever characters are appropriate for expressing the author's intent. A reasonable definition of English as a written language is: whatever an author (an ideal author, let's say) who speaks no languages other than English is able to write. For example, a mathematical text in English contains symbols necessary for expressing math. Can you imagine telling a novelist, or a physicist, that she can only write using ASCII characters from now on? Sure, you could force such a restriction on people, but it's crippling and arbitrary.

      "Jalapeño" [is] still correct if written as "jalapeno"

      Um, actually it isn't. It's still comprehensible, through the miracle of the human mind's ability to do pattern-matching; but it is really an example of shoe-horning the proper form of a written language into an arbitrary subset of characters. "ñ" and "n" are different (though similar) letters. But I'm glad to see you use this example, because it's illustrative of the English language's inclusive and subsuming nature.

      How do you pronounce "aa"? Perhaps it would help to know that the proper written form is "ʻaʻā". Yes, it's English. And jamming it into ASCII is a really bad idea.

      I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.

        I hope you'll forgive me, but I'm not willing to take your word for it about "jalapeno" being incorrect. I've emailed the editors of the OED, so if they reply I'll summarise here. Nevertheless, it is undeniable that English loan-words from other languages often lose their accents. The obvious example being "cafe". The only point of dispute here is one of degree - my assertion that they can *all* lose *all* their accents.

        I'm not sure what your point is supposed to be regarding mathematics, unless it was just stupid pedantry.