in reply to Re^5: Regex to remove generic accounts
in thread Regex to remove generic accounts

Yes, and no. Digits from other languages are matched by \d, but not every language. I think, but I haven't studied the Unicode property database in detail, that if the language uses a strict base-10 system, its digits are matched by \d. But the existance of a "tens" or "hundreds" symbol exclude all its digits from being matched by \d. And it may very well be that the database isn't consistent in this aspect. I don't know what system Japanese uses, but AFAIK, Kanji digits aren't matched by \d.

Replies are listed 'Best First'.
Re^7: Regex to remove generic accounts
by rovf (Priest) on Oct 28, 2008 at 13:32 UTC

    Hmmm.... The Kanji for 1-9 (they use "our" 0 for denoting zero) can be used in two ways, one mimics exactly our positional base-10 system, the other one does not (it is easy to see from the way the number is written which of the two usages is being employed). So, if Kanji don't count for \d, can you give me other examples besides 0-9 which are considered digits? Maybe the Greek ordinal symbols? They are at least used in "base 10" fashion.

    -- 
    Ronald Fischer <ynnor@mm.st>
      perl -MConfig -aF';' -nE 'BEGIN {@ARGV = "$Config{privlib}/unicore/Uni +codeData.txt"} say $F[1] if $F[2] eq "Nd"'
      This gives me 290 matches in 5.10.
        and 270 in 5.8
        []s, HTH, Massa (κς,πμ,πλ)