Please forgive the nit-picky nature of this reply, but your post raised a number of interesting points.
my $rxSpaces = qr{(?x) # Use regex extended syntax to allow comments
(?: # Open non-capturing group for alternation
(?<= \A ) \s+ # Spaces preceded by beginning of string
| # or
(?<= \s ) \s+ # Spaces preceded by a single space
| # or
\s+ (?= \z ) # Spaces followed by end of string
) # Close group
};
Many of the details of this regex no doubt have an expository purpose. However, more or less in descending order of importance:
-
In the (?<= \A ) \s+ and \s+ (?= \z ) sub-patterns, the zero-width look-around assertions are overkill because \A and \z are already zero-width assertions, so the simpler \A \s+ and \s+ \z (respectively) are exactly equivalent and IMHO preferable;
-
The (?: ... ) non-capturing group surrounding the alternation is redundant because the whole qr// is effectively wrapped in a non-capturing group;
-
Lastly, the (?x) at the start of the regex is IMHO to be avoided in favor of a standard /xms tail for this (and every!) regex. (This is my personal regex best practice.)
Then what you have is a regex like
qr{ (?<= \s) \s+ | \A \s+ | \s+ \z }xms
which IMHO is very easy to understand.
The use of Perl's ordered regex alternation raises the question the proper order of the sub-patterns. My experience has been that only testing can answer this question reliably:
c:\@Work\Perl\monks>perl -wMstrict -le
"use Test::More 'no_plan';
use Test::NoWarnings;
;;
note 'perl version: ', $];
;;
use constant S => ' Intel(R) Xeon(R) CPU X5660 2.80GHz ';
use constant T => 'Intel(R) Xeon(R) CPU X5660 2.80GHz';
;;
for my $rxSpaces (
qr{ (?<= \s) \s+ | \A \s+ | \s+ \z }xms,
qr{ \A \s+ | (?<= \s) \s+ | \s+ \z }xms,
qr{ \A \s+ | \s+ \z | (?<= \s) \s+ }xms,
) {
(my $t = S) =~ s{$rxSpaces}{}g;
ok $t eq T, qq{$rxSpaces -> \n >$t<};
}
;;
note qq{still with spaces? >${ \S }<};
done_testing;
"
# perl version: 5.008009
ok 1 - (?msx-i: (?<= \s) \s+ | \A \s+ | \s+ \z ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz<
ok 2 - (?msx-i: \A \s+ | (?<= \s) \s+ | \s+ \z ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz<
ok 3 - (?msx-i: \A \s+ | \s+ \z | (?<= \s) \s+ ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz<
# still with spaces? > Intel(R) Xeon(R) CPU X5660 2.80GHz
+<
1..3
ok 4 - no warnings
1..4
Ok, no ordering dependency is seen.
Now you think, "Gee, with Perl 5.10 there's that neat \K variable-width look-behind emulation operator I can use to simplify the regex even more!" Unfortunately, after testing (and you always test this stuff, right?) you find a problem:
c:\@Work\Perl\monks>perl -wMstrict -le
"use Test::More 'no_plan';
use Test::NoWarnings;
;;
note 'perl version: ', $];
;;
use constant S => ' Intel(R) Xeon(R) CPU X5660 2.80GHz ';
use constant T => 'Intel(R) Xeon(R) CPU X5660 2.80GHz';
;;
for my $rxSpaces (
qr{ (?<= \s) \s+ | \A \s+ | \s+ \z }xms,
qr{ \A \s+ | (?<= \s) \s+ | \s+ \z }xms,
qr{ \A \s+ | \s+ \z | (?<= \s) \s+ }xms,
qr{ \s \K \s+ | \A \s+ | \s+ \z }xms,
qr{ \A \s+ | \s \K \s+ | \s+ \z }xms,
qr{ \A \s+ | \s+ \z | \s \K \s+ }xms,
) {
(my $t = S) =~ s{$rxSpaces}{}g;
ok $t eq T, qq{$rxSpaces -> \n >$t<};
}
;;
note qq{still with spaces? >${ \S }<};
done_testing;
"
# perl version: 5.010001
ok 1 - (?msx-i: (?<= \s) \s+ | \A \s+ | \s+ \z ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz<
ok 2 - (?msx-i: \A \s+ | (?<= \s) \s+ | \s+ \z ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz<
ok 3 - (?msx-i: \A \s+ | \s+ \z | (?<= \s) \s+ ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz<
not ok 4 - (?msx-i: \s \K \s+ | \A \s+ | \s+ \z ) ->
# > Intel(R) Xeon(R) CPU X5660 2.80GHz <
# Failed test '(?msx-i: \s \K \s+ | \A \s+ | \s+ \z )
+ ->
# > Intel(R) Xeon(R) CPU X5660 2.80GHz <'
# at -e line 1.
not ok 5 - (?msx-i: \A \s+ | \s \K \s+ | \s+ \z ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz <
# Failed test '(?msx-i: \A \s+ | \s \K \s+ | \s+ \z )
+ ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz <'
# at -e line 1.
ok 6 - (?msx-i: \A \s+ | \s+ \z | \s \K \s+ ) ->
# >Intel(R) Xeon(R) CPU X5660 2.80GHz<
# still with spaces? > Intel(R) Xeon(R) CPU X5660 2.80GHz
+<
1..6
ok 7 - no warnings
1..7
# Looks like you failed 2 tests of 7.
Hmmm... The (?<= \s) \s+ sub-pattern continues to work just fine everywhere, but the seemingly equivalent \s \K \s+ sub-pattern only works in the last position in the ordered alternation. Why? (Food for thought, this.)
A lot of these points echo those made by Laurent_R here: regexes are really neat and I love them, but they're not always the ideal tool for the job.
Give a man a fish: <%-{-{-{-<
|