in reply to split and capture some of the separators

I have almost always found that a "capture split" is better replaced by a proper m//g instead. Have you considered that?

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

  • Comment on •Re: split and capture some of the separators

Replies are listed 'Best First'.
Re^2: split and capture some of the separators
by shemp (Deacon) on Oct 07, 2004 at 21:58 UTC
    I did think about that, but i cant quite see how to do it in this case. Part of the problem for me is that the separators are well-defined, but what between them could be anything (except a separator).
      You can use the following "continue to split" mechanism in a loop:
      (my($token, $sep), $string) = split /PATTERN/, $string, 2;
      This will load the matched string (the separator) into $sep, the stuff before that into $token, and the rest of the string right after the match, into the string, shortening it, ready for the next iteration — provided you have exactly one pair of capturing parens in the pattern.

      It's almost identical in effect (bar the negative impact on the global speed of regexes) as using the special variables $`, $&, $' on a normal match, using the same pattern.

      If you could have more capturing parens, you can do:

      my($token, @sep) = split /PATTERN/, $string, 2; $string = pop @sep;
      leaving all the captured separators in @sep.
      Part of the problem for me is that the separators are well-defined, but what between them could be anything (except a separator).

      This looks (to me) like that sentence written in perl:
      @list = /([SEPARATORS])+([^SEPARATORS])*/g;
      You could include \s in the second character class if you wanted to ignore whitespace.

      Of course, I've been wrong in the past :)