in reply to regex matching specific strings.
I've tried $var =~ /abc|def|ghi/ but a string such as abdefhi is a false positive. I've also tried /(abc|def|ghi)/ and /(abc)|(def)|(ghi)/ but the aforementioned abdefhi matches all of those.
/abc|def|ghi/ and /(abc)|(def)|(ghi)/ match "abdefhi" because it isn't anchored. Thus it can match "def" anywhere in the string, including in the middle of the string. To force Perl to match "abc","def", or "ghi" to the whole string, one must anchor the regular expression with "^" and "$" (or "\z"). "^" means match just before the first character. "\z" means the end of the string. "$" means match the end of the string or just before the first new-line, whichever comes first.
To add "^" and "$" you must surround "abc|def|ghi" with parenthesis. Either capturing (...) or non-capturing (?:...) may be used. Otherwise Perl will think that "^" belongs only to the first regular expression. For example, in $var =~ /^abc|def|ghi\z/; Perl will think that you are looking for one of three alternatives: "abc" at the beginning of string, "def" anywhere in the string, or "ghi" at the end of the string. By contrast, /^(abc|def|ghi)\z/ and /^(?:abc|def|ghi)\z/ (see post by ikegami) will only look for all three strings (abc, def, ghi) only at the beginning of the string.
In this case non-capturing parenthesis are the better choice. Capturing parenthesis stuff whatever they match inside a variable. But in this case, if the regex matches at all, it matches the whole string so you already have it in a variable.
Hope this explains why the regexs given by kennethk and ikegami do work.
Best, beth
Update - 2009-07-27 - struck out portion below as incorrect or no longer applicable: /abc|def|ghi matches "abc" or "def" or "ghi" anywhere in the string.
"|" only defines alternatives between adjacent regex components, so /abc|def|ghi/ and (abc|def|ghi) both mean match "ab" followed by either c or d followed by "e", followed by either f or g followed by "hi". To get "|" to treat "abc", "def", and "ghi" as alternative whole strings you must surround each string "abc","def", "ghi" with non-capturing regular expression. Non-capturing parenthesis are spelled (?:regex). They tell Perl - treat this sequence of letters as a single regular expression.
You can also surround "abc","def","ghi" with plain parenthesis. Plain parentheseis also group sequences of letters into a single regular expression, but they also "capture" the match and stuff it into a variable.
This is wasteful unless you need to stuff the match into a variable. Even if you do need to stuff the match into a variable, it probably won't do what you expect. Perl will treat each match as a separate variable and populate $1 with "abc" if $var contains "abc" and undef if it doesn't. To stuff whichever of the three happen to match into $1, one needs to surround the whole set of alternatives with a capturing regular expression, like this: ((?:abc)|(?:def)|(?:ghi)).
|
|---|