Regex: not the trailing whitespace

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I'm recognizing a string that has some stuff, and the last thing of interest in the string should include everything to the end but not any trailing whitespace.

Consider as a concrete example the ability to match list items in Wiki-style markup. Here is what I ended up using:

 my ($prefix, $content) = $line =~ /^\s*([*#]+:?)\s*(.*\S)?(?:(?<=\S)\
+s*)?$/;
[download]

It's especially complicated for the fact the last thing could be missing completely. The interesting part here is the two uses of \S. The first makes sure the thing ends with a non-space. The second is a look-back assertion so the final trailing whitespace won't just be eaten by the main item (which can certainly include internal spaces). I considered doing something with a "cut" first, but didn't work it out.

Is there some better (or clearer?) trick to doing this? Perhaps different ideas using newer regex features?

Comment on Regex: not the trailing whitespace Download Code

Replies are listed 'Best First'.
Re: Regex: not the trailing whitespace by ikegami (Patriarch) on Apr 23, 2011 at 05:36 UTC
Just delete the trailing whitespace first. `while (<>) { s/\s+\z//; ... }` [download] The second is a look-back assertion so the final trailing whitespace won't just be eaten by the main item Huh? /`[#]+:?`/ cannot eat the trailing whitespace. So all you need is `/ ^ \s ([#]+:?) \s (.\S)? \s $ /sx` [download] (Is some backtracking protection needed?)	[reply] [d/l] [select]
Re^2: Regex: not the trailing whitespace by John M. Dlugosz (Monsignor) on Apr 23, 2011 at 06:48 UTC
Huh? `/[#]+:?/` cannot eat the trailing whitespace. No, I mean the `(.)` would. The leading string of bullets is the prefix, and the content of the line after that is the "main item".	[reply] [d/l]
Re^3: Regex: not the trailing whitespace by ikegami (Patriarch) on Apr 23, 2011 at 06:54 UTC
No, I mean the (.) would.* There is no /`(.)`/ in your code. If you mean the /`(.\S)`/, it can't eat the trailing whitespace either.	[reply] [d/l] [select]
Re^4: Regex: not the trailing whitespace by John M. Dlugosz (Monsignor) on Apr 23, 2011 at 07:56 UTC
Re^5: Regex: not the trailing whitespace by ikegami (Patriarch) on Apr 23, 2011 at 08:01 UTC
Some notes below your chosen depth have not been shown here
Re: Regex: not the trailing whitespace by wind (Priest) on Apr 23, 2011 at 06:40 UTC
Just rely on greedy and non-greedy matching to eat up all the spaces where appropriate. Don't need any special new features, or even any additional boundary conditions: `my ($prefix, $content) = $line =~ / ^ \s* ([#]+:?) \s (.?) \s $ /x` [download]	[reply] [d/l]
Re^2: Regex: not the trailing whitespace by John M. Dlugosz (Monsignor) on Apr 23, 2011 at 06:51 UTC
I see. I don't remember exactly what I was thinking, but I suppose using a \S helps it optimize. OTOH, it might make it harder for the engine to understand. I should try some benchmarks with and without.	[reply]