Can you explain why my original regex doesn't work? Maybe I don't understand the finer points of greediness. I would think that the regex would try to match the X first and once successful, would try to match the \S+ and succeed at that. | [reply] |
Can you explain why my original regex doesn't work? Maybe I don't understand the finer points of greediness. I would
think that the regex would try to match the X first and once successful, would try to match the \S+ and succeed at that.
The pattern you gave is /X?(\S+)/. That says, match zero-or-one
'X' character followed by (and capture) one-or-more non-space characters.
Now with the string "abcX123", the re begins at the beginning of the string
and asks itself "can I match zero-or-one 'X' characters here?, and the
answer is 'Yes, I can successfully match zero 'X' characters righ here'
which it does, and then goes ahead and tries to match one-or-more non-space
characters (which it also does). Does that help you get the idea?
| [reply] [d/l] |
Your original regex was m/X?(\S+)/
The problem is that the + quantifier is greedier than ?, and will thus, try to match as many characters as possible. Since the X is optional, due to the ? quantifier, X? is yielding to the \S+ portion of your pattern, so that \S+ matches everything even if there is an X that could have matched X?.
You may be able to get around that problem as simply as by specifying non-greedy matching for the \S+ portion of the regex. In fact, that might be a better solution than the others I've suggested later in this thread. However, I tend to like to spell things out more clearly than simply making something non-greedy and hoping for the best. My later suggestions force \S+ to give up something, whereas specifying non-greediness just weights the tug-of-war.
Nevertheless, specifying non-greed might just be the simplest approach to your problem, so here it is (untested):
m/X?(\S+?)$/
Updated: As another Anonymous Monk pointed out, forcing non-greed in the \S+ portion of the regex doesn't help, and thus, the answers I've posted lower in this thread are preferable over the one I've striked out in this node. Or Roger's answer, which allows either case to be captured by the same set of parens, negating the need to count capturing parens. Anon is right though, X? being optional makes \S+ (and \S+?) rob the X from X?
| [reply] [d/l] [select] |
| [reply] [d/l] [select] |
| [reply] [d/l] |