pike has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellow monks, I have a regex with capturing brackets and I would like to find the position and the contents of the brackets for every match in a string. E. g. given the following regex and string:

my $regex = qr/(\d\d):(\d\d)(?::(\d\d))?/; # 01234567890123456 my $string = "11:30 or 11:29:53";
I would like to get the result:

$res = { 0 => ['11', '30', undef], 19 => ['11', '29', '53'] }

If I evaluate the the the regex in scalar context, I can get the position of each match, like this:

$res->{pos ($string)} = [$1, $2, $3] while $string = /$regex/g;
But the problem is that I read the regex from a file, so I have no idea how many capturing brackets it contains (and I think it is rather hard to find that out, isn't it?). Therefore I don't how many $n to put in the result array.

On the other hand, I can evaulate the regex in list context, as in:

@res = $string = /$regex/g;
but then I get neither the positions nor the information how often the string matched (I don't know how many capturing brackets the regex contains!). Is there any way to get the contents of the brackents in an array, but only for one match, and to iterate over all matches in a while loop, as m//g does in scalar context?

Thanks for any advice,

pike

Replies are listed 'Best First'.
Re: Multiple matches of a regex
by Fletch (Bishop) on Jan 10, 2003 at 15:21 UTC

    Look for @+ and @- in perldoc perlvar.

      Thanks for the tip, Fletch! (Can't ++ you today, ran out of votes... Wait for monday please!) So that leaves me with:

      use strict; my $regex = qr/(\d\d):(\d\d)(?::(\d\d))?/; # 01234567890123456 my $string = "11:30 or 11:29:53"; my $res; while ($string =~ /$regex/g) { my $pos = $-[0]; my ($ind, @match); for $ind (1..$#-) { push @match, substr ($string, $-[$ind], $+[$ind] - $-[$ind]); } $res->{$pos} = \@match; } use Data::Dumper; print Data::Dumper::Dumper $res;
      This prints:
      $VAR1 = { '0' => [ '11', '30' ], '9' => [ '11', '29', '53' ] };
      as it should.

      What I don't like about this solution is that it involves a lot of substr calls to get strings that are stored in the $n variables anyway. Is there a better way to do this?

      Thanks,

      pike

        As you are using capturing brackets within your regex, your required substrings are already being placed into the variables $1 $2 $3, so there is no need to use substr to isolate the substrings a second time. The only tricky bit is that $3 may have no value, so I've used grep to exclude it if it has no value.

        #! perl -slw use strict; use Data::Dumper; my $regex = qr/(\d\d):(\d\d)(?::(\d\d))?/; # 01234567890123456 my $string = "11:30 or 11:29:53"; my $res; $res->{$-[0]} = [grep $_, $1, $2, $3] while $string =~ /$regex/g; print Dumper $res; __END__ c:\test>225814 $VAR1 = { '0' => [ '11', '30' ], '9' => [ '11', '29', '53' ] }; c:\test>

        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

        If you have anything against substr calls you can do this symbolically (through more people will probably have something against this than repeated substr calls).
        no strict 'refs'; my @groups = map $$_, 1 .. $#+;
        If I shall be picky, in your question you wanted the first match to return   ['11', '30', undef] while in your reply you said   ['11', '30'] is what you want. As I pointed out here you should consider the difference between $#+ and $#-. Also, unmatched groups would be '' with your code. Perhaps that's what you want, but it won't be fully analogous to using the $<DIGITS> variables. If done symbolically it sure will be analogous though, since you actually use the $<DIGITS> variables.

        Hope I've helped,
        ihb
Re: Multiple matches of a regex
by ihb (Deacon) on Jan 10, 2003 at 16:26 UTC
    As Fletch pointed out, what you want to use is the @+ and @- variables. However, there are some gotchas I figure are worth mentioning. It's very important which variables you choose to work if you want it done properly. A correct snippet is shown below:

    my @groups = map { defined $-[$_] ? substr($string, $-[$_], $+[$_] - $-[$_]) : undef } 1 .. $#+;

    What's important to note is that you must use @+ to get the number of submatches, and @- to check for successfulness. (As it is now (on my activeperl 5.6.1) you can use defined $+[$_] too to check for successfulness, but the documentation says nothing about it, and only $-[$_]'s definedness is mentioned. So the safe one ought to choose to work with @- for this.)

    Hope I've helped,
    ihb

    Update:
    I just scanned through perlretut (a document I haven't read since I figured I knew regexes well enough to need a tutorial when it was written) and found that all the info you have received in this thread is concisely described in it under "Extracting matches".