in reply to Re: Re: Re: How to remove the $1 hard coding
in thread How to remove the $1 hard coding

The split is not the same as the pattern. The pattern uses \s* as the delimiter

Well, actually, since the regex in the OP was:

m/\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*/
I was going to assert that this would generally be equivalent to splitting on whitespace, with the obvious difference that, if the string began with whitespace, split would return a list that included an empty string as the first element -- the first element returned by the regex would be the second element returned by split.

But then I noticed another difference, which gave me pause, and I wondered if the OP had a clear grasp of the relevant detail -- that is, whether this regex is really doing what was intended. Consider the following:

$s1="ABC D E"; $s2=" ABCD E "; # (leading and trailing spaces) print join( ":", split /\s+/, $s1 ), $/; print join( ":", split /\s+/, $s2 ), $/; print $/; print join( ":", ($s1=~/\s*(\S+)\s*(\S+)\s*(\S+)\s*/)), $/; print join( ":", ($s2=~/\s*(\S+)\s*(\S+)\s*(\S+)\s*/)), $/; __OUTPUT__ ABC:D:E :ABCD:E ABC:D:E ABC:D:E
The first two lines of output show that split will return an empty string as the first list item if the string begins with a delimiter, whereas it will (by default) ignore trailing delimiters (but you can control that).

The last two lines demonstrate the tenacity of the regex engine -- it does its best to match as much of the regex as possible. In this case, it takes the liberty of breaking up the "ABCD" portion of $s2, so that it can have non-empty values inside every set of capturing parens. The behavior is very different from split, indeed!

Personally, I wouldn't feel comfortable using that particular regex pattern -- split seems more suitable.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: How to remove the $1 hard coding
by tachyon (Chancellor) on Aug 23, 2003 at 10:28 UTC

    I was going to assert that this would generally be equivalent to splitting on whitespace, with the obvious difference that, if the string began with whitespace, split would return a list that included an empty string as the first element -- the first element returned by the regex would be the second element returned by split.

    You obviously didn't test it :-) split ' ' (as posted) is MAGICAL. It is not the same as split /\s*/.

    @v = split ' ', ' watch the magic '; print "$_: $v[$_]\n" for 0..$#v; print $/; @v = split /\s+/, ' lost the magic '; print "$_: $v[$_]\n" for 0..$#v; __DATA__ 0: watch 1: the 2: magic 0: 1: lost 2: the 3: magic

    As you see it is absolutely identical to the posted RE and drops leading whitespace.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      The original RE is m/\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*/. If this is equivilent to splitting on ' ', then I have some serious remedial RE work to do.

      The original RE will match 'ABCD' and split it into 4 characters. Your solution using split will not. I am not sure that the original poster is 100% certain about what the initial RE does, and your solution may match intent, but as written the are not the same.

      Now, since you are using the pattern \s+ as your split, then your splits are nearly equivilant (leading space and trailing space aside). However, your split is not equivilant to the original RE.

      Nothing personal, just technically misleading. All of the info was correct (++), just not related to the OP (--).

        Sure - you are of course correct. However like you I am almost positive the poster has no idea how that RE will behave, vis:

        @res = 'IS~THISOK?' =~ m/\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*/; print "@res\n"; __DATA__ IS~THIS O K ?

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print