in reply to Unearthed Arcana
It tries 101 times, populating @_ in each round. With the final pop, that results in 100 random numbers.
So, do you have any challenging puzzles? ;-)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Silly ligatures
by educated_foo (Vicar) on May 13, 2011 at 10:30 UTC | |
| [reply] |
by tchrist (Pilgrim) on May 13, 2011 at 13:42 UTC | |
Completely off-topic, your post demonstrates the profound stupidity of Unicode ligatures. Ligatures are a typographic trick to make certain sequences of letters like "fi" and "ffi" look pretty when displayed in some media. Comically, the Unicode ligatures not only make life a royal pain for regular expression matching, but they're also ugly as sin (compare the actual "fi" to the "fi"-ligature here). They're even less useful than pages of emoji.The reason Unicode has those particular ligatures is to preserve the originals when doing round‐trip conversions with legacy encodings that allowed such things to be specified with distinct, individual codes. In modern typesetting, such matters should be — and are — taken care of automatically. ¡Fontalicious!On the matter of being ugly as sin, here is my emoji example where I actually use fi ligatures three times, just because that was a posting where I was being extreme in the font games. If you look closely at that example, they do look marginally better there than the unkerned alternatives, although not so much that you would normally even notice them. Which is just as it should be.It certainly isn’t “ugly as sin”; it looks fine. Of course, if you’re using some brutish sans serif font as your default display and that font hasn’t made allowances for these legacy ligatures, so that you have to resort to some fallback font‐substitution glyph, then well that’s the price you pay for brutishness. 😜 On the other hand, in this sample in Adobe Caslon Pro, I use no ligatures at all; all that is figured out for me by the font itself. For a somewhat subtler effect, here’s that sample again, this time in Adobe Garamond Pro. But for real sophistication, there’s just nothing like that same sample rendered in Zapfino. All three of those samples are fine examples of good kerning rules that don’t make the user say how and what and where things are tied together — that is, ligated. (Hey, did you know that that ligar con alguien is Spanish slang for “to hook up”, as in “to get laid”?) It all magically falls out of the OpenType rules built into each respective font.
NFKD($s) =~ /⋯/iNow, regarding the regex matter. The legacy ligatures are actually doing people a service here, because they make it obvious that you cannot just do blind searches on unnormalized Unicode text. Regexes make no allowances for things like default ignorables, diacritic‐insensitive comparisons, decompositions, or collation‐strength equivalences. And you need all those things.Now, it just so happens that Unicode does have case folds for the legacy ligatures, although these are the one‐to‐many full case folds that next to nobody but Perl even tries to handle. That means this works:
However, because we don’t allow incomplete matches stranding part of a code point, this doesn’t:
That shows why you really want a compatibility decomposition for text searching:
I’ll address collation‐strength equivalence, including but not limited to diacritic‐insensitive matching, some other day.
| [reply] |