in reply to Re^3: Regex to remove generic accounts
in thread Regex to remove generic accounts

They are certainly not equivalent. There are 10 characters that match /[0-9]/. The number of characters that match /\d/ varies from Perl version to Perl version. There are more than 100 characters that match /\d/ in 5.10, and that's only a proper subset of what is being matched in blead.

Replies are listed 'Best First'.
Re^5: Regex to remove generic accounts
by rovf (Priest) on Oct 28, 2008 at 09:36 UTC
    There are more than 100 characters that match /\d/ in 5.10

    Does this mean that digits from other languages are also considered as 'digit' by \d? For example, if I have a string consisting of Japanese kanji, would \d match the Kanji digits too?

    -- 
    Ronald Fischer <ynnor@mm.st>
      Yes, and no. Digits from other languages are matched by \d, but not every language. I think, but I haven't studied the Unicode property database in detail, that if the language uses a strict base-10 system, its digits are matched by \d. But the existance of a "tens" or "hundreds" symbol exclude all its digits from being matched by \d. And it may very well be that the database isn't consistent in this aspect. I don't know what system Japanese uses, but AFAIK, Kanji digits aren't matched by \d.

        Hmmm.... The Kanji for 1-9 (they use "our" 0 for denoting zero) can be used in two ways, one mimics exactly our positional base-10 system, the other one does not (it is easy to see from the way the number is written which of the two usages is being employed). So, if Kanji don't count for \d, can you give me other examples besides 0-9 which are considered digits? Maybe the Greek ordinal symbols? They are at least used in "base 10" fashion.

        -- 
        Ronald Fischer <ynnor@mm.st>