in reply to Re: Re: How to remove the $1 hard coding
in thread How to remove the $1 hard coding

The split is not the same as the pattern. The pattern uses \s* as the delimiter, which means that ABCD should match that pattern, as well as "A B C D".

I just tested this, and it fails:

push @sps, ($string =~ m/RE/)[$columnNumber];

Could some fine monk explain why? Wouldn't the RE evaluate in a list context, which would then have an element extracted from it?

Replies are listed 'Best First'.
Re: Re: Re: Re: How to remove the $1 hard coding
by graff (Chancellor) on Aug 22, 2003 at 23:56 UTC
    The split is not the same as the pattern. The pattern uses \s* as the delimiter

    Well, actually, since the regex in the OP was:

    m/\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*/
    I was going to assert that this would generally be equivalent to splitting on whitespace, with the obvious difference that, if the string began with whitespace, split would return a list that included an empty string as the first element -- the first element returned by the regex would be the second element returned by split.

    But then I noticed another difference, which gave me pause, and I wondered if the OP had a clear grasp of the relevant detail -- that is, whether this regex is really doing what was intended. Consider the following:

    $s1="ABC D E"; $s2=" ABCD E "; # (leading and trailing spaces) print join( ":", split /\s+/, $s1 ), $/; print join( ":", split /\s+/, $s2 ), $/; print $/; print join( ":", ($s1=~/\s*(\S+)\s*(\S+)\s*(\S+)\s*/)), $/; print join( ":", ($s2=~/\s*(\S+)\s*(\S+)\s*(\S+)\s*/)), $/; __OUTPUT__ ABC:D:E :ABCD:E ABC:D:E ABC:D:E
    The first two lines of output show that split will return an empty string as the first list item if the string begins with a delimiter, whereas it will (by default) ignore trailing delimiters (but you can control that).

    The last two lines demonstrate the tenacity of the regex engine -- it does its best to match as much of the regex as possible. In this case, it takes the liberty of breaking up the "ABCD" portion of $s2, so that it can have non-empty values inside every set of capturing parens. The behavior is very different from split, indeed!

    Personally, I wouldn't feel comfortable using that particular regex pattern -- split seems more suitable.

      I was going to assert that this would generally be equivalent to splitting on whitespace, with the obvious difference that, if the string began with whitespace, split would return a list that included an empty string as the first element -- the first element returned by the regex would be the second element returned by split.

      You obviously didn't test it :-) split ' ' (as posted) is MAGICAL. It is not the same as split /\s*/.

      @v = split ' ', ' watch the magic '; print "$_: $v[$_]\n" for 0..$#v; print $/; @v = split /\s+/, ' lost the magic '; print "$_: $v[$_]\n" for 0..$#v; __DATA__ 0: watch 1: the 2: magic 0: 1: lost 2: the 3: magic

      As you see it is absolutely identical to the posted RE and drops leading whitespace.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        The original RE is m/\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*/. If this is equivilent to splitting on ' ', then I have some serious remedial RE work to do.

        The original RE will match 'ABCD' and split it into 4 characters. Your solution using split will not. I am not sure that the original poster is 100% certain about what the initial RE does, and your solution may match intent, but as written the are not the same.

        Now, since you are using the pattern \s+ as your split, then your splits are nearly equivilant (leading space and trailing space aside). However, your split is not equivilant to the original RE.

        Nothing personal, just technically misleading. All of the info was correct (++), just not related to the OP (--).

Re: Re: Re: Re: How to remove the $1 hard coding
by chunlou (Curate) on Aug 22, 2003 at 19:37 UTC
    Probably you meant this.
    push @a, ('a b c'=~/(\w) /g)[1]; print "@a";
    You need () to capture something into the array. You need the switch /g if you want to put every match into an array, not just the first match (hence for the index [$i] to work).