ariels has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to get more information about my regexp matches. Specifically, I'd like to know the start and length (or equivalent information, of course) of the entire match, as well as any groups (/(...)/ constructs).

Emacs Lisp provides routines match-beginning, match-end and match-data to access this sort of information.

Currently I do something like this:

$string =~ /^(.*?)($regexp)/;
and then use length $1 and length $2 to give me the information I need. This is not a good way to do things; among other things, it shifts the numbers of any groupings in $regexp by 2 (this code is not responsible for $regexp).

And while I get information about the start and extent of the entire match, I don't get any information about the groups!

Replies are listed 'Best First'.
Re: Getting information about regexp matches
by converter (Priest) on Apr 23, 2001 at 12:21 UTC
    In Perl 5.6.0 and later you can use the arrays @- and @+ (@LAST_MATCH_START and @LAST_MATCH_END) to get the beginning and ending offsets of the entire match ($-[0] and $+[0]), and the beggining and ending offset for each subpattern match. The beginning and ending offsets for $1, for example, will be found in $-[1] and $+[1].

    for more information, see: perlvar

Re: Getting information about regexp matches
by alfie (Pilgrim) on Apr 23, 2001 at 12:47 UTC
    I'm not quite sure if I get you right, but maybe you should take a look into perlman:perlvar and $& for the match, $` for the prematch and $' for the postmatch. If you use English you can even use $MATCH, $PREMATCH and $POSTMATCH.
    --
    use signature; signature(" So long\nAlfie");
      don't use these when you don't have to: from man perlvar

      The use of this variable anywhere in a program
      imposes a considerable performance penalty on all
      regular expression matches. See the BUGS manpage.

      snowcrash //////
Re: Getting information about regexp matches
by arturo (Vicar) on Apr 23, 2001 at 17:43 UTC

    Just to note: since $1 and its ilk will keep their values if the regex fails to match, make sure you wrap this all in a conditional. I'm not sure about what's so bad about using length here; the grouping number shift is going to be a problem anyway, if your code's not setting $regexp, how will you know whether it has groupings at all? (and if so, how many). Of course, if you want to nab those matches, you can always use a nice listy assignment like:

    if (my ($foo, $bar, @reg_groups) = $string =~ /^(.*?)($regexp)/) { # $foo = $1, $bar = $2, @reg_groups = all the matching groups from $ +regexp my $totength = length $foo.$bar; my $foolength = length $foo; # whatever happens next }

    HTH

Re: Getting information about regexp matches
by lemming (Priest) on Apr 23, 2001 at 11:59 UTC

    Not sure of this since I'm a little unclear how $regexp was generated. How ever, you may want to look into the function index.

    Update: Hmm. index looks to be blank.

    index STR, SUBSTR, OFFSET
    This function searches for one string within another. It returns the position of the first occurance of SUBSTR in STR. The OFFSET, if specified, says how many characters from the start to skip before beginning to look.
    Returns -1, if not found, positions based of off 0, and there's also rindex.

    Most of that quoted from the Camel

      Thanks, I already know about index.

      Unfortunately it is no good. It gives me the start position of a substring of my string (since I give the string, it doesn't have to give me its length). Which doesn't help for a regular expression.