AidanLee has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I was writing a little regex for a pattern i had on my hands and realized that I wasn't sure how it would wind up working. So I wrote myself a little test:

use strict; use diagnostics; while( <DATA> ) { my @list = (); if( @list = ( $_ =~ m|^rs://([-\w]+)(?:\.([-\w]+))+|) ) { print join( ' > ', @list ), "\n"; } } __DATA__ rs://a rs://b.c rs://d.e.f rs://g.h.i.j.k

so it's looking to match a pattern that is a word followed by one or more words, with periods in between them. Here's what I got back from the output:

b > c d > f g > k

I was kind of hoping it would give me back the whole

b > c d > e > f g > h > i > j > k

which is of course what split() could do for me, but i didn't see that right away. But now i'm left wondering what happened to all of the other values "in the middle" that matched. The regex seems to just have overwritten them.

Replies are listed 'Best First'.
Re: Regex Matching Oddity
by japhy (Canon) on Jun 27, 2001 at 21:55 UTC
    The regex explainer would have told you something to help you:
    print "japhy" =~ /(.)+/; # prints 'y' ONLY
    Here's what explain has to say:
    ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you're using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1)


    japhy -- Perl and Regex Hacker

      Thanks for the tip, japhy. I seem to remember hearing about the regex explainer sometime in the past. What/Where is it?

      Thinking my 100th post would be a lil' more meaningful than this. :)

Re: Regex Matching Oddity
by jwest (Friar) on Jun 27, 2001 at 21:38 UTC
    Every time the pattern for $2 matches (the second ([-\w]+) in this case), it replaces the $2 with the new value. So, in effect, they were overwritten, just as you speculate.

    What you want, I'm not positive a regex can do. split definitely seems like your best option, as you said. Hope this helps!

    --jwest

    -><- -><- -><- -><- -><-
    All things are Perfect
        To every last Flaw
        And bound in accord
             With Eris's Law
     - HBT; The Book of Advice, 1:7
    
Re: Regex Matching Oddity
by petral (Curate) on Jun 27, 2001 at 22:00 UTC
    $ perl -lwe'print join" > ",m#(?:^rs://|\.)([-\w]+)#g for qw(rs://b.c rs://d.e.f rs://g.h.i.j.k)' b > c d > e > f g > h > i > j > k $
    I kinda expected this would work:(m|^rs://([-\w]+)|g, m|\G\.([-\w]+)|g)update: even though this does:m|^rs://([-\w]+)|, m|\.([-\w]+)|gWhy isn't the pos() set?

      p

      I approached this the way i did because I thought it kind of reminded me a bit of BNF:

      pattern: 'rs://' token [ '.' token ]+ token: [-\w]+
Re: Regex Matching Oddity
by suaveant (Parson) on Jun 27, 2001 at 22:04 UTC
    the Anonymous Monk is right... only two sets of () means only $1 and $2 gets set, with $2 being overwritten with the latest value each match... I suppose you could do something like...
    push @list, $a while($a = /(?:\Ars://([-\w]+)|\.([-\w]+))/g);
    untested, but I think it would work... but your best bet is probably to regexp out the address then split it, as you said...

                    - Ant

Re: Regex Matching Oddity
by Anonymous Monk on Jun 27, 2001 at 21:25 UTC
    You have 2 () in your regex ((?:) doesn't count). So s must return an array with 2 elements. And how do you think it should return the other words?
Re: Regex Matching Oddity
by AidanLee (Chaplain) on Jul 25, 2001 at 01:12 UTC

    I just revisited this today to use it for how it works instead of how i thought it would work.... I've got a pattern like I described, but i am only interested in the first bit and the last bit, so this works out perfectly :). The things we learn in the monestary.... I'm sure I would have forgotten all about this if I hadn't posted the original node and discussed it with my fellow monks. Thanks all :)