Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

regex for multiple capture within boundary

by xafwodahs (Scribe)
on Jul 14, 2006 at 19:36 UTC ( [id://561309]=perlquestion: print w/replies, xml ) Need Help??

xafwodahs has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, Let's say I have
$x = "a1aa11 b2bb22bbb222 c3cc33";
and I want a single regex that grabs all the number characters between the spaces. So, in other words, I want an array like [2, 22, 222] from:.
(@nums) = $x =~ /<something here>/<modifiers here>;
What would your wisdom suggest?

Replies are listed 'Best First'.
Re: regex for multiple capture within boundary
by ikegami (Patriarch) on Jul 14, 2006 at 19:57 UTC

    Update: Ah! Now I understand! How did I miss that?

    # @nums = ('2', '22', '222'); my @nums = ($x =~ /\s(\S+)/)[0] =~ /(\d+)/g;

    or

    # @nums = ('2', '22', '222'); my @nums = (split(' ', $x, 3))[1] =~ /(\d+)/g;
      I think I understand what is going on in the first of your updated solutions but if I change it to read

      my @nums = ($x =~ /\s(\S+)/)[1] =~ /(\d+)/g;

      expecting output of

      3 33

      it doesn't work unless I also make the first match global like this

      my @nums = ($x =~ /\s(\S+)/g)[1] =~ /(\d+)/g;

      I think this is because the round brackets around the match put the match into list context and the [0] subscript grabs the first elements of the match; however, since the match is non-global there will only ever be one element in the list and trying to get more will not work. If we want a second or subsequent element we must make the match global to capture more than one element.

      Have I understood this correctly or am I completely missing the point?

      Cheers,

      JohnGG

        That's exactly it (although there could be 0 elements if the match fails).

        my @nums = ($x =~ /\s(\S+)/)[0] =~ /(\d+)/g;
        could also be written as
        my @nums = ($x =~ /\s(\S+)/ ? $1 : undef) =~ /(\d+)/g;

        If you're going to use /g, drop the \s:

        # @nums = ('3', '33'); my $word = 2; my @nums = ($x =~ /(\S+)/g)[$word] =~ /(\d+)/g;

        or use split:

        # @nums = ('3', '33'); my $word = 2; my @nums = (split(' ', $x))[$word] =~ /(\d+)/g;
Re: regex for multiple capture within boundary
by Ieronim (Friar) on Jul 14, 2006 at 20:05 UTC
    @nums = $x =~ m/(\d+)/g;
    UPD: I just did not understand the OP correctly :)
      This is pulling out all the digits. The OP is just asking for the "digits between the spaces".
      my $x = "a1aa11 b2bb22bbb222 c3cc33"; my @nums = $x =~ /\d+/g; print join(" ", @nums) . "\n"; Output: 1 11 2 22 222 3 33
      Ikegami's answer is pulling out the numbers between the spaces.
Re: regex for multiple capture within boundary
by jwkrahn (Abbot) on Jul 14, 2006 at 20:16 UTC
    The simple answer is:
    my @nums = $x =~ /\d+/g;
    Update: Okay, I reread the problem. :-)
    $ perl -le'$x = "a1aa11 b2bb22bbb222 c3cc33"; print for ( $x =~ /\s(\S ++)\s/ )[ 0 ] =~ /\d+/g' 2 22 222
Re: regex for multiple capture within boundary
by Sidhekin (Priest) on Jul 14, 2006 at 21:58 UTC
    I want a single regex that grabs all the number characters between the spaces

    Extreme regexing? I cannot resist such a question. This ought to do it:

    @nums = $x =~ /(?:^\S*\s+|\G)[^\s\d]*(\d+)/g;

    However, if it does not absolutely have to be a single regex, I suspect the maintainer will prefer one of ikegami's solutions. :-)

    Update: I've been out-extremed! Not perhaps by the first of ikegami's latter solutions, which after all is straight-forward, and probably as maintainable as mine, but certainly by the second. It took me three attempts just to read it: My eyes glazed over twice!

    ... what's with the local *nums; though? Yet another update: Ah. Considerate of you. :-)

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      I hope you're refering to one of these and not one of the following ;)
      local *nums; our @nums; $x =~ / \s (?: [^\s\d]* (\d+) (?{ push(@nums, $1) }) )* /x;

      or

      local *nums; our @nums; $x =~ / ^ (?> \S* \s ) \S*? (?<!\d) ( (?> \d+ ) ) (?{ push(@nums, $1) }) (?!) /x;
      ... what's with the local *nums; though?

      I used package variables because regexps capture. Putting the regexp in a sub would only work once if I had used lexical variables instead of pacakge variables.

      our @nums; makes it so I can say @nums instead of @main::nums to refer to the package variable.

      local *nums works better than local @nums;. They both ensure that @main::nums has the same value when we're done as it did when we started. In other words, it makes sure we're not trampling over someone else's variables.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://561309]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2024-04-18 15:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found