in reply to Re^2: regexp question
in thread regexp question

Of course it's different.
[\d\s] # matches a single char - digit or whitespace \d\s # matches two chars - a digit then whitespace

Replies are listed 'Best First'.
Re^4: regexp question
by ramprasad27 (Sexton) on Oct 28, 2011 at 09:51 UTC

      Those are the same.

      The whole point of something like \d is to be a convenient abbreviation. It would be irritating (and error prone) if you had to spell it out in full each time it was used inside []

      You're missing the whole point here. Square brackets are a character class. If I try to match on [a-z0-9], I'm specifying one character that falls in the class of characters from a-z and 0-9. That is, I'm trying to match one character that could be any of those in the class.

      But if I try to match on [.], I'm specifying one character that falls in the class of characters that are a period. In other words, [a-z] could match 'a', or 'b', or 'c', etc., but [.] can only ever match '.'. So [.] is exactly equal to '.' Thus it's a useless use of a character class.

      To use another example of yours, [\d\s] will match one character that is either a digit or a space character. It could match 9, or 8, or ' '. \d and \s retain their "magic" even in a character class. [\d] = \d = [0-9]

      The lesson here is don't use single-character classes.

      --marmot
        [\d] = \d = [0-9]
        Not exactly true. See perlrecharclass:
        "\d" matches a single character that is considered to be a digit. What is considered a digit depends on the internal encoding of the source string and the locale that is in effect. If the source string is in UTF-8 format, "\d" not only matches the digits '0' - '9', but also Arabic, Devanagari and digits from other languages. Otherwise, if there is a locale in effect, it will match whatever characters the locale considers digits. Without a locale, "\d" matches the digits '0' to '9'. See "Locale, EBCDIC, Unicode and UTF-8".