Re: Re: Capturing everything after an optional character in a regex?
by Roger (Parson) on Dec 04, 2003 at 07:20 UTC
|
Ok, here's my attempt after Anonymous monk's intention is clear.
$string =~ m/(?:(?=.*?X)X|(?!.*?X))(\S+)/;
And here's a little test -
$string1="abcX123";
$string2="abc123";
$string1 =~ m/(?:(?=.*?X)X|(?!.*?X))(\S+)/;
print "$1\n";
$string2 =~ m/(?:(?=.*?X)X|(?!.*?X))(\S+)/;
print "$1\n";
And the output is as expected, and both in $1 -
123
abc123
And the tricky bit in the above regex is the (?:(?=.*?X)X|(?!.*?X)) part, which defines an optional anchor point.
Update: I hit my head on the wall a couple of times, literally, after I saw sauoq's much clever solution below. I was locked up with the idea of an optional anchor point, that I have failed to notice the vital bit of the clue - capture till the end, that defined a fixed anchor point to look back from, instead of a floating anchor point that looks forward. Although my solution worked, it was way too complicated than is necessary.
An important lesson I have learnt today: when a problem seems rather complicated, take a step back and look for other clues. The alternative solution is probably staring right in my face!
| [reply] [d/l] [select] |
Re: Re: Capturing everything after an optional character in a regex?
by davido (Cardinal) on Dec 04, 2003 at 06:40 UTC
|
This tests ok for my contrived test string:
m/X(\S+)|((?<!X)\S+)/
Or here's a way without negative lookbehind:
m/X(\S+)|([^X]\S*)/
They're not functionally identical, but should both accomplish what you've described.
Of course there is the issue now of counting capturing parens.
| [reply] [d/l] [select] |
|
|
And because it's part of a larger regex I'll have to put parens around the alternation...
| [reply] |
|
|
| [reply] [d/l] [select] |
|
|
You had me for a second. I even went back and checked perlre to see how on earth the /x modifier would eliminate the fact that my RE will load $1 under one alternation option, and $2 under the other alternation option. Of course there's nothing there other than what I expected to find: that the /x modifier allows for whitespace and comments within regexes.
Are you suggesting that with /x it is somehow easier to determine which set of capturing parens loaded up the $1 and $2 special variables? Regular expression readability notwithstanding, I'm missing the point I guess.
The easiest solution I see to the counting problem is just to wrap the entire expression in an outter set of parens so that $1 captures both the right or the left side of the alternation.
But the point is moot I guess, since sauoq's answer seems to have solved the OP's problem in a more graceful way anyway.
| [reply] |
|
|
|
|
|
|
|
Re: Re: Capturing everything after an optional character in a regex?
by sauoq (Abbot) on Dec 04, 2003 at 09:27 UTC
|
It seems further clarification would be helpful. What do you want to do in the case that there is more than one 'X'? (Or is that case not in your requirements?) If such a case won't exist, or if you want to get everything after the last 'X', I stand by my original suggestion. Use /([^X]*)$/.
If you can have more than one 'X' and you want everything after the first X then something like /^(?:.*?X)?(.*)$/ should do the trick.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
|
|
There should be only one X.
Also, I tried to modify your suggestion to /([^X\s]*)/ since, as in my original code, I only want to grab non-whitespace. But then if the string is 'abcX12 3' I only match abc, when I want to match the '12'. Also, as I said before this is part of a larger regex so I can't use begin/end of line characters.
| [reply] [d/l] [select] |
|
|
how about /.*?X(\S*)|([^X\s]*)/ ?
| [reply] [d/l] |
|
|
since, as in my original code, I only want to grab non-whitespace
Your original question was, "How can I capture everything after an optional character?" Your code didn't work. Were we supposed to look at your broken code and inuit which parts of it should be considered a specification of your requirements?
Also, as I said before this is part of a larger regex so I can't use begin/end of line characters.
You did not say that. You said, "This is a smaller part of a larger regex and I'm looking for a regex solution." There isn't a word there about not being able to match the end of the string. Furthermore, in your "clarification", you went on to say, "If X is not there I want the whole string." Were we supposed to interpret that to mean "the whole string up until some other portion matched by another part of a larger expression?"
There are many here that would be happy to help you, but you need to effectively communicate what you want help with. Don't take shortcuts by giving us a minimal example if a solution to it isn't really what you need. Don't assume we will understand your problem; make it perfectly clear. Be careful to say what you mean. Remember, you are intimately familiar with your problem and we aren't. We can't read your mind.
Maybe you've already gotten an answer you can use. Maybe you've gotten an answer that you think is right but which may fail in some cases you haven't considered. Maybe you haven't gotten an answer at all. In any case, my suggestion to you is to repost your question as another SoPW node, but this time be complete in your request. Tell us the whole problem. The very fact that you thought we could easily give you part of a larger regular expression without knowing how it would fit into that larger expression reveals that you probably have a fundamental misunderstanding of how regular expressions usually work. Reasonably complex regexen generally don't just fit together like building blocks. Changing one small part may cause drastically different behavior.
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
Re: Re: Capturing everything after an optional character in a regex?
by bart (Canon) on Dec 04, 2003 at 23:05 UTC
|
If X is there I want everything after the X. If X is not there I want the whole string.
Try this on a copy of the string:
s/^.*?X//;
This will only substitute everything upto the first X, if one exists. It'll not change the string if it doesn't.
Drop the question mark if you want to locate the last "X".
And if the string can contain newlines, add the /s modifier, which changes the matching behaviour of /./ to possibly match a newline as well. | [reply] [d/l] |