Over the years of reading PerlMonks postings I have seen that nearly every day someone posts a regex question, and now its finally my turn :-).
The source of my trouble with this could just as easily be heatstroke or childhood pesticide exposure, as any inherent obscurity to the problem, but, whatever the cause is, I cannot at the present moment figure this out without asking for some assistance.
The problem is this (in verbal description -- I've seen so many badly-asked regex questions, I hope I do better!): a string of arbitrary length comprising multiple sentences with (possible) line breaks (\n) has (possibly) some rudimentary mark-up in the form often used for various sorts of emphasis in e-mail and USENET postings:
The content within the * and _ characters are multiple words and I need to somehow achieve tokenization of the span of text inside, then (so that I can) make *each* *word* surrounded by the appropriate character:That *doggone foolish Mabel* has toasted the _bread too long_ again.
That *doggone* *foolish* *Mabel* has toasted the _bread_ _too_ _long_ again.
Now to the Mystery part: the regex I have come up with only matches when the "markup" character used is "_" (underscore, which I'll note is not a Perl-type regex metacharacter, but instead a simple alphanumeric matched by <SAMP>\w</SAMP>), not when it is "*"! WHY? This one-liner illustrates the problem and contains my regex:
(there will be breaks in the line above that must be removed for testing as a "one-liner", obv.)perl -e '$gh = join qq[],(<STDIN>); if ($gh =~ m@(\b(\*|_)\S+\b)(.+?)(\b\S+\2\b)@s) {print join q[ ],$1, +$3,$4,q[ ];}' Happy _puppy life good_ yeah.
The output I get is this:
But if I use "*" instead, I get no output._puppy life good_
What is going here? (I am testing in <CITE>bash</CITE> on Cygwin, the UNI* emulation environment for Win32).
Thanks. Soren
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |