in reply to Multiple Capture Groups in RegEx

First, .* is almost always wrong. It is greedy, it does match nothing and anything.
Second, just do it. Get the second group, then break it up. You will have to deal with the elements anyways. It can be seen as a sport to process all with just one regex, but that's often error prone, unreadable, unmaintainable and slower.

Replies are listed 'Best First'.
Re^2: Multiple Capture Groups in RegEx
by ketema (Scribe) on Apr 24, 2009 at 00:24 UTC
    That makes sense I can easily split on the 2nd capture then process, I guess I was just locked into thinking I could get them all back separately from one regex. as for the .* it works, I could use .+ it works too
      as for the .* it works, I could use .+ it works too

      See perlre and look for greediness too, e.g m/.+?/

      It works ... until you have four comma delimited fields instead of three before the final digit field. .* is best used when you care about the beginning and the end of a string but not the stuff in between. A better way is:([^,]*,) for the comma delimited fields you care about, and a regex that carefully matches the pattern of the multiple semicolon delimited fields, such as ((?:\w*;\s+)*):

      ^([^,]*,)((?:\w*;\s+)*)([^,]*,)([^,]*,)([^,]*,).*,(\d)$

      Note the use of .* to skip past any extra fields before the final digit field. Using [^,]* instead of .* means you only grab up to (and not including) the next comma and no more.

      Best, beth