sub is_vowel {
return $_[0] =~ /
^
[\x{1F00}-\x{1FE3}\x{1FE6}-\x{1FFE}\x{0386}-\x{038F}\x{0390}\x
+{0391}\x{0395}\x{0397}\x{0399}\x{039F}\x{03A5}\x{03A9}\x{03AA}-\x{03B
+1}\x{03B5}\x{03B7}\x{03B9}\x{03BF}\x{03C5}\x{03C9}-\x{03CE}]
\z
/x;
}
There might be a better way of doing this, but i don't have time to research this right now.
Ask me! Ask me! :)
First off, I would never use literal magic numbers like that. Let’s look at what that string really is:
[ἀ-ΰῦ-῾Ά-ΏΐΑΕΗΙΟΥΩΪ-αεηιουω-ώ]
Ew, gross! See where that is leading? And if that’s not a big enough hint, here are those as named characters:
\N{GREEK SMALL LETTER ALPHA WITH PSILI}-
\N{GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA}
\N{GREEK SMALL LETTER UPSILON WITH PERISPOMENI}-
\N{GREEK DASIA}
\N{GREEK CAPITAL LETTER ALPHA WITH TONOS}-
\N{GREEK CAPITAL LETTER OMEGA WITH TONOS}
\N{GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS}
\N{GREEK CAPITAL LETTER ALPHA}
\N{GREEK CAPITAL LETTER EPSILON}
\N{GREEK CAPITAL LETTER ETA}
\N{GREEK CAPITAL LETTER IOTA}
\N{GREEK CAPITAL LETTER OMICRON}
\N{GREEK CAPITAL LETTER UPSILON}
\N{GREEK CAPITAL LETTER OMEGA}
\N{GREEK CAPITAL LETTER IOTA WITH DIALYTIKA}-
\N{GREEK SMALL LETTER ALPHA}
\N{GREEK SMALL LETTER EPSILON}
\N{GREEK SMALL LETTER ETA}
\N{GREEK SMALL LETTER IOTA}
\N{GREEK SMALL LETTER OMICRON}
\N{GREEK SMALL LETTER UPSILON}
\N{GREEK SMALL LETTER OMEGA}-
\N{GREEK SMALL LETTER OMEGA WITH TONOS}
So I think what needs to be done is that it needs to be reduced in normalization form D for canonical decomposition (which may introduce iotas because of the iota subscripts in Greek), and then after getting rid of marks and diacritics, some sort of pattern match comparison to only the 7 Greek vowels done.
To show you why you have to be more careful, here is an example of a phrase whose first word is all vowels, but which when rendered in upper‐, lower‐, and titlecase give very different looking results, because the number of code points changes under full case folding:
- Lowercase
-
- Titlecase
- Uppercase
-
So here’s probably how I’d do it, since I prefer to be able to read the code:
use utf8;
use strict;
use warnings;
use Unicode::Normalize qw(NFD);
sub is_greek_vocalic($) {
die "wrong args" unless @_ == 1;
local $_ = NFD(lc(shift()));
s/\p{Mark}+//g; # combining marks from NFD form
s/\p{Diacritic}+//g; # eg, GREEK DASIA, which is \p{Sk}
return scalar m{ ^ [αεηιουω] + \z }x;
}
But if you want to use named characters, it would look more like this:
use Unicode::Normalize qw(NFD);
sub is_greek_vocalic($) {
use charnames "greek";
die "wrong args" unless @_ == 1;
local $_ = NFD(lc(shift()));
s/\pM+//g; # combining marks from NFD form
s/\p{Diacritic}+//g; # eg, GREEK DASIA, which is \p{Sk}
return scalar m{
^
[\N{alpha}\N{epsilon}\N{eta}\N{iota}\N{omicron}\N{upsilon}\N{o
+mega}]+
\z
}x;
}
Doesn’t that look better now? |