Character Class Abbreviations

Character class abbreviations allow you to match any of a set of characters without too much hassle. One way to do this is to put the set of characters you want to match from within []. For instance [0123456789] would allow you to match any of those numbers. This can be kind of cumbersome. You can also negate a character class by placing a caret at the front of it. For instance [^0123456789] matches anything that is not a number. You shouldn't be surprised that Perl makes your life much easier by defining some character class a bbreviations. These are alphanumeric characters preceded by a backslash. Perl allows you to match any number with a \d in your regular expression.

Now for a quick word about metacharacters. Metacharacters are characters that have special meaning within regular expressions. Therefore if you put them into a regular expression they won't match literally. Unless you precede the metacharacter with a \. The metacharacters are \|()$^.?* Now for a quick word about each of them do before we return to character class abbreviations.

Metacharacter(s)	Meaning
.	Matches any character besides newline
()	Used for grouping characters
[]	Used for defining character classes
\|	Used for or in regular expression
\	Denotes the beginning of a character class abbreviation, or for the following metacharacter to be matched literally
*	Quantifier matches 0 or more of the previous character or group of characters
?	Makes a quantifier nongreedy
^	Matches the beginning of a string (or line if /m is used)
$	Matches the end of a string (or line if /m is used)

Now lets define some character classes

Character Class	Meaning
\d	digit or [0123456789]
\D	nondigit or [^0123456789]
\w	word (alphanumeric) or [a-zA-Z_0-9]
\W	nonword
\b	word boundary
\s	whitespace character [ \t\r\n\f]
\S	non whitespace character

That's a lot of information to get a handle on. So lets check out pattern-matching examples

Comment on Character Class Abbreviations

Replies are listed 'Best First'.
Isn't '+' a metacharacter too? by Anonymous Monk on Jun 17, 2002 at 17:40 UTC
Similar to '*' only it matches 1 or more of the previous character.	[reply]
Re: Character Class Abbreviations by Terminal (Initiate) on Dec 23, 2005 at 21:29 UTC
Perhaps you could show some examples of these? I'm a bit confused... I tried `if ($_ =~ [[en]]) { print "yes\n";} else { print "no\n";}` [download] Didn't work :( Always printed no, even when I had "en" in the document :( Code tags added by Arunbear	[reply] [d/l]
Re^2: Character Class Abbreviations by planetscape (Chancellor) on Dec 24, 2005 at 00:19 UTC
In replying to a 6-yr old node, your question is in danger of flying beneath everyone's radar. Better to post under Seekers of Perl Wisdom. To get an idea of how this site works, I recommend looking at the "Welcome to the Monastery" section of the Tutorials page. You might wish to check out some of the references listed here: Re: regexp: extracting info In your example, it sounds to me more like you are trying to match the literal string "en". But let's assume for a moment you really want a character class... One of the tools I didn't mention in the writeup above is the simple "patten test" program from Learning Perl, 3rd Ed. I often use this when first constructing a regex because it's simple, easy to edit, and I get immediate feedback. So let's start with that program, modified slightly to match the character class `[en]` . # From: Schwartz & Phoenix: Learning Perl, 3rd Ed (The Llama), pp. 103 use strict; while (<>) { chomp; if (/[en]/) { print "Matched: \|$`<$&>$'\|\n"; } else { print "No match.\n"; } } [download] Let's assume that this is our "test" file: `English French Spanish German Aramaic Arabic` [download] Where and how the character class matches may surprise you: `Matched: \|E<n>glish\| Matched: \|Fr<e>nch\| Matched: \|Spa<n>ish\| Matched: \|G<e>rman\| No match. No match. No match.` [download] If you were expecting output more like the following, matching the string "en": `No match. Matched: \|Fr<en>ch\| No match. No match. No match. No match. No match.` [download] You will need to change the program as follows: # From: Schwartz & Phoenix: Learning Perl, 3rd Ed (The Llama), pp. 103 use strict; while (<>) { chomp; if (/en/) { print "Matched: \|$`<$&>$'\|\n"; } else { print "No match.\n"; } } [download] HTH, planetscape	[reply] [d/l] [select]
Re: Character Class Abbreviations by theantler (Sexton) on Mar 18, 2010 at 09:17 UTC
I think this tutorial/intro of yorus is really good and helpful. You say: The metacharacters are \\|()$^.?* But [] are also metachars since they dont match litterally but have special meaning with the regex, so shoouldnt they be in that list too? It would be nice if you list all the metachars. Thanks - ta	[reply]


Welcome to the Monastery
	PerlMonks