Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
So given this text: go/to <img src='http://allpoetry.com:8080/images/smile/happy.gif'> /thx/ <img src='http://allpoetry.com:8080/images/smile/happy.gif'> silly *to*
Should return: g<em>to <img src='http://allpoetry.com:8080/images/smile/happy.gif'> </em>thx/ <img src='http://allpoetry.com:8080/images/smile/happy.gif'> silly <b>to</b>
Its rather key to NOT do anything to the last '/' - the problem is matching a '/' from the img instead.
As an additional complication, I'd like it to handle nested formatting as well (*/test/* = test)
I've been working on this for a *long* time, with many different iterations , but none seem that close. I'd like to avoid parsing it out into html tags and non-tags, but if anyone can suggest an efficient way to do that I'd love to hear it.
Here is what I have:
sub reg_fix { my $c = shift; $c =~ s#(<.*?>)|(/|\*|_|=)((?:<.*?>|[^\2])+?)\2\s?# if ($1) {$1} elsif ($2 eq '/') {'<em>' . reg_fix($3) . '</em> +'} elsif ($2 eq '*') {'<b>' . reg_fix($3) . '</b> '} elsif ($2 eq '_') {'<u>' . reg_fix($3) . '</u> '}#gemio; return $c; } $c = reg_fix($c);
Which now fails because in the second part: ((?:<.*?>|[^\2])+?)\2\s?#, if it DOESN'T find \2, will backtrack and dip inside the <.*?> tag, which is what I want to avoid. I thought of trying to use (\2|$) at the end, to see if we were at the end, but then it won't finish parsing the string for other matches.
$c =~ s#((<.*?>)|(/|\*|_|=)((?:<.*?>|[^\2])+?)(\2|$)\s?)# sub reg_fix { my $c = shift; print "Running check on '$c'\n"; $c =~ s#((<.*?>)|(/|\*|_|=)((?:<.*?>|[^\2])+?)(\2|$)\s?)# if (!$5) {$1} elsif ($2) {$1} elsif ($3 eq '/') {'<em>' . reg_fix($4) . '</em> '} elsif ($3 eq '*') {'<b>' . reg_fix($4) . '</b> '} elsif ($3 eq '_') {'<u>' . reg_fix($4) . '</u> '}#gemio; return $c; }
Any ideas, great and powerful perl monks? Thanks :)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: match *bold* formatting, but avoid html (tokenize)
by tye (Sage) on Sep 26, 2003 at 05:08 UTC | |
|
Re: match *bold* formatting, but avoid html
by Fletch (Bishop) on Sep 26, 2003 at 01:28 UTC | |
|
Re: match *bold* formatting, but avoid html
by delirium (Chaplain) on Sep 27, 2003 at 21:41 UTC |