You should really pick up a few O'Reilly perl books.
From Programming Perl, by O'Reilly & Associates, the online edition:
2.4.1.3 The fine print
As mentioned above, \1, \2, \3, and so on, are equivalent to whatever the corresponding set of parentheses matched, counting opening parentheses from left to
right. (If the particular pair of parentheses had a quantifier such as * after it, such that it matched a series of substrings, then only the last match counts as the
backreference.) Note that such a backreference matches whatever actually matched for the subpattern in the string being examined; it's not just a shorthand for the
rules of that subpattern. Therefore, (0|0x)\d*\s\1\d* will match "0x1234 0x4321", but not "0x1234 01234", since subpattern 1 actually matched "0x", even
though the rule 0|0x could potentially match the leading 0 in the second number.
Outside of the pattern (in particular, in the replacement of a substitution operator) you can continue to refer to backreferences by using $ instead of \ in front of the
number. The variables $1, $2, $3 ... are automatically localized, and their scope (and that of $`, $&, and $' below) extends to the end of the enclosing block or
eval string, or to the next successful pattern match, whichever comes first. (The \1 notation sometimes works outside the current pattern, but should not be relied
upon.) $+ returns whatever the last bracket match matched. $& returns the entire matched string. $` returns everything before the matched string.24 $' returns
everything after the matched string. For more explanation of these magical variables (and for a way to write them in English), see the section "Special Variables" at the
end of this chapter.
24 In the case of something like s/pattern/length($`)/eg, which does multiple replacements if the pattern occurs multiple times, the value of $`
does not include any modifications done by previous replacement iterations. To get the other effect, say:
1 while s/pattern/length($`)/e;
For example, to change all tabs to the corresponding number of spaces, you could say:
1 while s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
You may have as many parentheses as you wish. If you have more than nine pairs, the variables $10, $11, ... refer to the corresponding substring. Within the pattern,
\10, \11, and so on, refer back to substrings if there have been at least that many left parentheses before the backreference. Otherwise (for backward compatibility)
\10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through \9 are always backreferences.)
Examples:
s/^([^ ]+) +([^ ]+)/$2 $1/; # swap first two words
/(\w+)\s*=\s*\1/; # match "foo = foo"
/.{80,}/; # match line of at least 80 chars
/^(\d+\.?\d*|\.\d+)$/; # match valid number
if (/Time: (..):(..):(..)/) { # pull fields out of a line
$hours = $1;
$minutes = $2;
$seconds = $3;
}
Hint: instead of writing patterns like /(...)(..)(.....)/, use the unpack function. It's more efficient.
A word boundary (\b) is defined as a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the
imaginary characters off the beginning and end of the string as matching a \W. (Within character classes \b represents backspace rather than a word boundary.)
Normally, the ^ character is guaranteed to match only at the beginning of the string, the $ character only at the end (or before the newline at the end), and Perl does
certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by ^ or $. However, you may wish to treat a
string as a multi-line buffer, such that the ^ will also match after any newline within the string, and $ will also match before any newline. At the cost of a little more
overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this practice is now deprecated.) \A
and \Z are just like ^ and $ except that they won't match multiple times when the /m modifier is used, while ^ and $ will match at every internal line boundary. To
match the actual end of the string, not ignoring newline, you can use \Z(?!\n). There's an example of a negative lookahead assertion.
To facilitate multi-line substitutions, the . character never matches a newline unless you use the /s modifier, which tells Perl to pretend the string is a single line - even
if it isn't. (The /s modifier also overrides the setting of $*, in case you have some (badly behaved) older code that sets it in another module.) In particular, the
following leaves a newline on the $_ string:
$_ = <STDIN>;
s/.*(some_string).*/$1/;
If the newline is unwanted, use any of these:
s/.*(some_string).*/$1/s;
s/.*(some_string).*\n/$1/;
s/.*(some_string)[^\0]*/$1/;
s/.*(some_string)(.|\n)*/$1/;
chop; s/.*(some_string).*/$1/;
/(some_string)/ && ($_ = $1);
Note that all backslashed metacharacters in Perl are alphanumeric, such as \b, \w, and \n. Unlike some regular expression languages, there are no backslashed
symbols that aren't alphanumeric. So anything that looks like \\, \(, \), \<, \>, \{, or \} is always interpreted as a literal character, not a metacharacter. This
makes it simple to quote a string that you want to use for a pattern but that you are afraid might contain metacharacters. Just quote all the non-alphanumeric
characters:
$pattern =~ s/(\W)/\\$1/g;
You can also use the built-in quotemeta function to do this. An even easier way to quote metacharacters right in the match operator is to say:
/$unquoted\Q$quoted\E$unquoted/
Remember that the first and last alternatives (before the first | and after the last one) tend to gobble up the other elements of the regular expression on either side, out
to the ends of the expression, unless there are enclosing parentheses. A common mistake is to ask for:
/^fee|fie|foe$/
when you really mean:
/^(fee|fie|foe)$/
The first matches "fee" at the beginning of the string, or "fie" anywhere, or "foe" at the end of the string. The second matches any string consisting solely of "fee" or
"fie" or "foe".
You should be able to use this to find your solution.
NOTE: This was reproduced without permission, and if
someone doesn't like it (someone official, that is), I will
happily remove it.
J. J. Horner
Linux, Perl, Apache, Stronghold, Unix
jhorner@knoxlug.org http://www.knoxlug.org/
|