Re: Capturing everything after an optional character in a regex?

You can't really do what you're asking in the way you're trying to do it.

Is it the digits you're trying to capture? Or do the digits at least guarantee the start of what you want to capture? It's impossible to expect to start immediately following a character that may not be there. If it's not there, you'll get just everything instead. If 'X' is optional, it's an unreliable anchor.

Try something like this, if digits mark the start:

m/(\d\S*)$/
[download]

I used \S* instead of \S+, so that if the string contains 'abc1' the RE will still capture the '1'. Also, you said "everything after...", so I anchored the match all the way to the end of the string with the $ metachar.

Dave

Comment on Re: Capturing everything after an optional character in a regex? Download Code

Replies are listed 'Best First'.
Re: Re: Capturing everything after an optional character in a regex? by Anonymous Monk on Dec 04, 2003 at 06:53 UTC
Can you explain why my original regex doesn't work? Maybe I don't understand the finer points of greediness. I would think that the regex would try to match the X first and once successful, would try to match the \S+ and succeed at that.	[reply]
Re: Re: Re: Capturing everything after an optional character in a regex? by Anonymous Monk on Dec 04, 2003 at 07:06 UTC
Can you explain why my original regex doesn't work? Maybe I don't understand the finer points of greediness. I would think that the regex would try to match the X first and once successful, would try to match the \S+ and succeed at that. The pattern you gave is `/X?(\S+)/`. That says, match zero-or-one 'X' character followed by (and capture) one-or-more non-space characters. Now with the string "abcX123", the re begins at the beginning of the string and asks itself "can I match zero-or-one 'X' characters here?, and the answer is 'Yes, I can successfully match zero 'X' characters righ here' which it does, and then goes ahead and tries to match one-or-more non-space characters (which it also does). Does that help you get the idea?	[reply] [d/l]
Re: Re: Re: Capturing everything after an optional character in a regex? by davido (Cardinal) on Dec 04, 2003 at 07:02 UTC
Your original regex was `m/X?(\S+)/` The problem is that the + quantifier is greedier than ?, and will thus, try to match as many characters as possible. Since the X is optional, due to the ? quantifier, X? is yielding to the \S+ portion of your pattern, so that \S+ matches everything even if there is an X that could have matched X?. You may be able to get around that problem as simply as by specifying non-greedy matching for the \S+ portion of the regex. In fact, that might be a better solution than the others I've suggested later in this thread. However, I tend to like to spell things out more clearly than simply making something non-greedy and hoping for the best. My later suggestions force \S+ to give up something, whereas specifying non-greediness just weights the tug-of-war. ~~Nevertheless, specifying non-greed might just be the simplest approach to your problem, so here it is (untested):~~ ~~`m/X?(\S+?)$/`~~ ~~[download]~~ Updated: As another Anonymous Monk pointed out, forcing non-greed in the \S+ portion of the regex doesn't help, and thus, the answers I've posted lower in this thread are preferable over the one I've striked out in this node. Or Roger's answer, which allows either case to be captured by the same set of parens, negating the need to count capturing parens. Anon is right though, X? being optional makes \S+ (and \S+?) rob the X from X? Dave	[reply] [d/l] [select]
Re: Re: Re: Re: Capturing everything after an optional character in a regex? by Anonymous Monk on Dec 04, 2003 at 07:32 UTC
The greediness of `\$+` has nothing to do with the observed behavior, and making it non-greedy doesn't help the situation. The "problem" is strictly the optional nature of the `X?`.	[reply] [d/l] [select]
blame it on greedy, ignore remaining Dwarves by Anonymous Monk on Dec 05, 2003 at 21:16 UTC
This is slightly OT, but, I have to ask... why does greediness get the blame for so much? I am not an expert in RE engines, but I am pretty sure that "leftmostness" trumps greediness nearly every time. Correct? ie: "leftmost" match always succeeds before the "best" match, or "biggest" match. `\S+`'s greediness doesn't really figure into this problem in the very least, as far as I can tell. Greediness is right-acting, not omni-directional.	[reply] [d/l]