in reply to Regexp and transliteration between languages

Many thanks for all the answers. This helps as lot. I will try with these expressions. The problem is getting more interesting as I have started to look deeper. Somehow intutively I have a feeling that we can make better use of vowel patterns but don't know how. It is guranteed all the vowels in the language will be made from combination of english vowels : aeiou for example mukharjee -> m (non vowel) + u (vowel) + kh (non vowel)+ a (vowel) + rj (non vowel) + ee (vowel) I feel now that if I can use of this vowel-nonvowel pattern, I won't need the combination of all characters and they will take care of them selves. For example "k" will be always follwed by a vowel(combnation of aeiou) and so does the "kh" so by splitting on this will take care of whether it is "k", "kh" or say "khx" So what may be needed is get all character till you find anything till it matches (aeiou) , then get all characters till a non aeiou is found (effectively getting vowel) and so on. Any suggestions
  • Comment on Re: Regexp and transliteration between languages

Replies are listed 'Best First'.
RE: Re: Regexp and transliteration between languages
by chromatic (Archbishop) on Jun 16, 2000 at 04:44 UTC
    This is an ugly, half baked idea, but you might do something like this:
    while ($word) { ($consonants, $vowel, $word) = split(/([aeiou])/, $word, 2); # do something here }
    I really like the parenthesis collection feature in split.