Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Can someone give me a regex to read from a string matches of character and give me the result of what's between them? Ie. The string holds "Hi there *mom* and *dad*". The regex would search for the patter *...stuff...* and will return what it finds inside (in this example, mom and dad would have to be stored and automatically instructed to be bolded. The purpose of this is, I want people to be able to have more control over their text in my perl program, I want to give them the ability to italicize and bold text (without using the real HTML codes).

I know this is a mess to read. Basically I want it to see if * something * exists and if it does, convert everything between a pair of characters to whatever I want (make it bold, italics, make it into a link, etc).

Thanks!

Replies are listed 'Best First'.
Re: Another regex needed
by Zaxo (Archbishop) on Sep 20, 2003 at 19:56 UTC

    The regex looks awful with asterisks in it, so we compile it and only have to look at it once.

    use CGI qw/:standard/; my $stars = qr/\*([^*]*)\*/s; local $_ = 'Hi there *mom* and *dad*'; s/$stars/i($1)/eg; print;
    Modify the substitute string to taste.

    Update: Thanks to bart and other CBers for the spot, had a s/// flag hanging off the regex instead. Repaired.

    After Compline,
    Zaxo

Re: Another regex needed
by chromatic (Archbishop) on Sep 20, 2003 at 20:06 UTC

    Sounds a bit like Text::WikiFormat. If there's something you want that it doesn't do, let me know.

      Now that you mention it, Text::WikiFormat doesn't support Twiki style hyperlinks. Twiki has an extended link format [[http://www.twiki.org][Twiki]] as well as forced links such as [[Home Page]].

      Maybe there're more differences, but those are the once that caught my eye immediately. It might be nice if Text::WikiFormat supported all of Twiki's format.

      This is by no means a show stopper for me, just something I happened to notice (I'm a fan of Twiki).

      Just my 2 cents, -gjb-

Re: Another regex needed
by jeffa (Bishop) on Sep 21, 2003 at 15:34 UTC
    Have you looked at HTML::FromText yet? It doesn't handle italics, but it might be all that you need:
    use HTML::FromText; print text2html( 'Hi there *mom* and _dad_', bold => 1, underline => 1 );

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Another regex needed
by sulfericacid (Deacon) on Sep 20, 2003 at 19:48 UTC
    I am learning regexs myself so you should use my code as an idea rather than as something that's going to work :)

    I would write something like  m/[*(.*)*]+ $1/g; I *think* that is saying, match *, any character any number of times, * and it stores that into $1. I through the brackets in there because my original code was *(.*)* + $1/g; which doesn't look like the correct syntax.

    Anywhere, use this as an idea because I'm sure it doesn't work. Hope this helps!

    "Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

    sulfericacid

      Actually what your regex does is match either an asterisk, an open parentheses, a dot, or a close parentheses, (which is to say: * ) ( .) and do it one or more times. Followed by white space and then what happens to be in $1 at the time. This because the square brackets create a character class out of their contents. Please read perlre for more on character classes. Your original code was much closer to your intent, except that you would have needed to escape the asterisks in order to match literal asterisks in the text, as asterisks are meta chars. However even if you escaped them it still wouldn't work perfectly, please see death to dot star! for more.