I run a chatterbox on the allpoetry.com website. We allow formatting for *bold* to be turned bold, etc. We also have smilies, which are added first. The slashes turn out to be the biggest problems, since they're in images often.

So given this text:  go/to <img src='http://allpoetry.com:8080/images/smile/happy.gif'> /thx/ <img src='http://allpoetry.com:8080/images/smile/happy.gif'> silly *to*

Should return: g<em>to <img src='http://allpoetry.com:8080/images/smile/happy.gif'> </em>thx/ <img src='http://allpoetry.com:8080/images/smile/happy.gif'> silly <b>to</b>

Its rather key to NOT do anything to the last '/' - the problem is matching a '/' from the img instead.

As an additional complication, I'd like it to handle nested formatting as well (*/test/* = test)

I've been working on this for a *long* time, with many different iterations , but none seem that close. I'd like to avoid parsing it out into html tags and non-tags, but if anyone can suggest an efficient way to do that I'd love to hear it.

Here is what I have:

sub reg_fix { my $c = shift; $c =~ s#(<.*?>)|(/|\*|_|=)((?:<.*?>|[^\2])+?)\2\s?# if ($1) {$1} elsif ($2 eq '/') {'<em>' . reg_fix($3) . '</em> +'} elsif ($2 eq '*') {'<b>' . reg_fix($3) . '</b> '} elsif ($2 eq '_') {'<u>' . reg_fix($3) . '</u> '}#gemio; return $c; } $c = reg_fix($c);

Which now fails because in the second part: ((?:<.*?>|[^\2])+?)\2\s?#, if it DOESN'T find \2, will backtrack and dip inside the <.*?> tag, which is what I want to avoid. I thought of trying to use (\2|$) at the end, to see if we were at the end, but then it won't finish parsing the string for other matches.

$c =~ s#((<.*?>)|(/|\*|_|=)((?:<.*?>|[^\2])+?)(\2|$)\s?)# sub reg_fix { my $c = shift; print "Running check on '$c'\n"; $c =~ s#((<.*?>)|(/|\*|_|=)((?:<.*?>|[^\2])+?)(\2|$)\s?)# if (!$5) {$1} elsif ($2) {$1} elsif ($3 eq '/') {'<em>' . reg_fix($4) . '</em> '} elsif ($3 eq '*') {'<b>' . reg_fix($4) . '</b> '} elsif ($3 eq '_') {'<u>' . reg_fix($4) . '</u> '}#gemio; return $c; }

Any ideas, great and powerful perl monks? Thanks :)


In reply to match *bold* formatting, but avoid html by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.