in reply to Capturing everything after an optional character in a regex?

You can't really do what you're asking in the way you're trying to do it.

Is it the digits you're trying to capture? Or do the digits at least guarantee the start of what you want to capture? It's impossible to expect to start immediately following a character that may not be there. If it's not there, you'll get just everything instead. If 'X' is optional, it's an unreliable anchor.

Try something like this, if digits mark the start:

m/(\d\S*)$/

I used \S* instead of \S+, so that if the string contains 'abc1' the RE will still capture the '1'. Also, you said "everything after...", so I anchored the match all the way to the end of the string with the $ metachar.


Dave

Replies are listed 'Best First'.
Re: Re: Capturing everything after an optional character in a regex?
by Anonymous Monk on Dec 04, 2003 at 06:53 UTC
    Can you explain why my original regex doesn't work? Maybe I don't understand the finer points of greediness. I would think that the regex would try to match the X first and once successful, would try to match the \S+ and succeed at that.
      Can you explain why my original regex doesn't work? Maybe I don't understand the finer points of greediness. I would think that the regex would try to match the X first and once successful, would try to match the \S+ and succeed at that.

      The pattern you gave is /X?(\S+)/. That says, match zero-or-one 'X' character followed by (and capture) one-or-more non-space characters. Now with the string "abcX123", the re begins at the beginning of the string and asks itself "can I match zero-or-one 'X' characters here?, and the answer is 'Yes, I can successfully match zero 'X' characters righ here' which it does, and then goes ahead and tries to match one-or-more non-space characters (which it also does). Does that help you get the idea?

      Your original regex was m/X?(\S+)/

      The problem is that the + quantifier is greedier than ?, and will thus, try to match as many characters as possible. Since the X is optional, due to the ? quantifier, X? is yielding to the \S+ portion of your pattern, so that \S+ matches everything even if there is an X that could have matched X?.

      You may be able to get around that problem as simply as by specifying non-greedy matching for the \S+ portion of the regex. In fact, that might be a better solution than the others I've suggested later in this thread. However, I tend to like to spell things out more clearly than simply making something non-greedy and hoping for the best. My later suggestions force \S+ to give up something, whereas specifying non-greediness just weights the tug-of-war.

      Nevertheless, specifying non-greed might just be the simplest approach to your problem, so here it is (untested):

      m/X?(\S+?)$/

      Updated: As another Anonymous Monk pointed out, forcing non-greed in the \S+ portion of the regex doesn't help, and thus, the answers I've posted lower in this thread are preferable over the one I've striked out in this node. Or Roger's answer, which allows either case to be captured by the same set of parens, negating the need to count capturing parens. Anon is right though, X? being optional makes \S+ (and \S+?) rob the X from X?


      Dave

        The greediness of \$+ has nothing to do with the observed behavior, and making it non-greedy doesn't help the situation. The "problem" is strictly the optional nature of the X?.

        This is slightly OT, but, I have to ask... why does greediness get the blame for so much? I am not an expert in RE engines, but I am pretty sure that "leftmostness" trumps greediness nearly every time. Correct? ie: "leftmost" match always succeeds before the "best" match, or "biggest" match.

        \S+'s greediness doesn't really figure into this problem in the very least, as far as I can tell. Greediness is right-acting, not omni-directional.