in reply to Re^2: OSCON Perl Unicode Slides
in thread OSCON Perl Unicode Slides

Thanks. I'm not sure I buy all of it, but it's still something to chew on.

Specifically, "Björn" is not English (but may need to be legitimately processed by software that is otherwise English-only). The last time I've handwritten ÷ instead of putting numerator over denominator (approximated by a slash as much as the slash in %), I can't remember. Must have been grade school, and even then, likely pretty early in grade school. I actually find ÷ to be weird. :-) And, of course, 1¢ is meaningless today :-) I see very few items that are less than one dollar anymore, actually seeing the cent symbol seems like an anachronism.

As for the Æ bit, well, "rendering foreign words" means "not English". The legitimate part is dealing with foreign brand names - again, it's not English, but may need to be legitimately processed by software that is otherwise English-only. The rest of your quoted text shows that English has largely moved away from using the ligature, and moved on to usually using ae instead. In my experience, even words that natively would have had accents and such on them usually lose them when misappropriated by English, such as your Jalapeno, or Jim's resume (a single spelling of a word can have multiple pronunciations, without even needing to have different meanings! - think 'po TAY toe'/'po TAH toe')

Again, thanks. I was too much in a box here, which was uncomfortable because I usually like thinking outside boxes. I think that if tchrist's mini-rant weren't so over the top, and instead focused on why that's the case, I may have been a bit less confused. In fact, that'd be probably the main critique here for me: spend less time describing how bad something is and instead focus on why something is bad. Perhaps those who attend the talk will hear the reason why things are bad, but those of us who only get to see the slides may miss out :-)

Replies are listed 'Best First'.
Re^4: OSCON Perl Unicode Slides
by jdporter (Paladin) on Jul 25, 2011 at 20:32 UTC
    "Björn" is not English

    Names are universal. Anyway, "coöperate" is English (if a bit archaic). And anyway, we've already debated this at length. I'm thinking we shouldn't have to again.

    The bottom line for you, I think, is that it doesn't really matter what one specific language one document is written in. You will have to handle multi-lingual data, and that means Unicode.

    I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.