in reply to Re: Deleting intermediate whitespaces, but leaving one behind each word
in thread Deleting intermediate whitespaces, but leaving one behind each word

Please forgive the nit-picky nature of this reply, but your post raised a number of interesting points.

my $rxSpaces = qr{(?x) # Use regex extended syntax to allow comments (?: # Open non-capturing group for alternation (?<= \A ) \s+ # Spaces preceded by beginning of string | # or (?<= \s ) \s+ # Spaces preceded by a single space | # or \s+ (?= \z ) # Spaces followed by end of string ) # Close group };

Many of the details of this regex no doubt have an expository purpose. However, more or less in descending order of importance:

Then what you have is a regex like
    qr{ (?<= \s) \s+ | \A \s+ | \s+ \z }xms
which IMHO is very easy to understand.

The use of Perl's ordered regex alternation raises the question the proper order of the sub-patterns. My experience has been that only testing can answer this question reliably:

c:\@Work\Perl\monks>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; note 'perl version: ', $]; ;; use constant S => ' Intel(R) Xeon(R) CPU X5660 2.80GHz '; use constant T => 'Intel(R) Xeon(R) CPU X5660 2.80GHz'; ;; for my $rxSpaces ( qr{ (?<= \s) \s+ | \A \s+ | \s+ \z }xms, qr{ \A \s+ | (?<= \s) \s+ | \s+ \z }xms, qr{ \A \s+ | \s+ \z | (?<= \s) \s+ }xms, ) { (my $t = S) =~ s{$rxSpaces}{}g; ok $t eq T, qq{$rxSpaces -> \n >$t<}; } ;; note qq{still with spaces? >${ \S }<}; done_testing; " # perl version: 5.008009 ok 1 - (?msx-i: (?<= \s) \s+ | \A \s+ | \s+ \z ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz< ok 2 - (?msx-i: \A \s+ | (?<= \s) \s+ | \s+ \z ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz< ok 3 - (?msx-i: \A \s+ | \s+ \z | (?<= \s) \s+ ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz< # still with spaces? > Intel(R) Xeon(R) CPU X5660 2.80GHz +< 1..3 ok 4 - no warnings 1..4
Ok, no ordering dependency is seen.

Now you think, "Gee, with Perl 5.10 there's that neat  \K variable-width look-behind emulation operator I can use to simplify the regex even more!" Unfortunately, after testing (and you always test this stuff, right?) you find a problem:

c:\@Work\Perl\monks>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; note 'perl version: ', $]; ;; use constant S => ' Intel(R) Xeon(R) CPU X5660 2.80GHz '; use constant T => 'Intel(R) Xeon(R) CPU X5660 2.80GHz'; ;; for my $rxSpaces ( qr{ (?<= \s) \s+ | \A \s+ | \s+ \z }xms, qr{ \A \s+ | (?<= \s) \s+ | \s+ \z }xms, qr{ \A \s+ | \s+ \z | (?<= \s) \s+ }xms, qr{ \s \K \s+ | \A \s+ | \s+ \z }xms, qr{ \A \s+ | \s \K \s+ | \s+ \z }xms, qr{ \A \s+ | \s+ \z | \s \K \s+ }xms, ) { (my $t = S) =~ s{$rxSpaces}{}g; ok $t eq T, qq{$rxSpaces -> \n >$t<}; } ;; note qq{still with spaces? >${ \S }<}; done_testing; " # perl version: 5.010001 ok 1 - (?msx-i: (?<= \s) \s+ | \A \s+ | \s+ \z ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz< ok 2 - (?msx-i: \A \s+ | (?<= \s) \s+ | \s+ \z ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz< ok 3 - (?msx-i: \A \s+ | \s+ \z | (?<= \s) \s+ ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz< not ok 4 - (?msx-i: \s \K \s+ | \A \s+ | \s+ \z ) -> # > Intel(R) Xeon(R) CPU X5660 2.80GHz < # Failed test '(?msx-i: \s \K \s+ | \A \s+ | \s+ \z ) + -> # > Intel(R) Xeon(R) CPU X5660 2.80GHz <' # at -e line 1. not ok 5 - (?msx-i: \A \s+ | \s \K \s+ | \s+ \z ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz < # Failed test '(?msx-i: \A \s+ | \s \K \s+ | \s+ \z ) + -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz <' # at -e line 1. ok 6 - (?msx-i: \A \s+ | \s+ \z | \s \K \s+ ) -> # >Intel(R) Xeon(R) CPU X5660 2.80GHz< # still with spaces? > Intel(R) Xeon(R) CPU X5660 2.80GHz +< 1..6 ok 7 - no warnings 1..7 # Looks like you failed 2 tests of 7.
Hmmm... The  (?<= \s) \s+ sub-pattern continues to work just fine everywhere, but the seemingly equivalent  \s \K \s+ sub-pattern only works in the last position in the ordered alternation. Why? (Food for thought, this.)

A lot of these points echo those made by Laurent_R here: regexes are really neat and I love them, but they're not always the ideal tool for the job.


Give a man a fish:  <%-{-{-{-<