automagic-HTML regex

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I was thinking about ways to do HTML shortcuts for my blog, and I came up with this idea -- any block of text enclosed by [ and ] on a line by themselves automagically becomes an ordered list.

So I come up with this:

$string = '
blah blah blah before-list text
[
line one line 1 line one line 1 
line 2 line two line 2 line two line 
line number 3 line three line number 3 line three 
line 4
]
blah blah later after-list text
';


$string =~ s!
             ^\[$ # open-bracket as entire line  
             (.*) # all including linebreaks - /s modifier
             ^\]$ # close-bracket as entire line  
             !$x=$1;
              $x =~ s/^/<li>/g; # LIs at the start of each line
              "<ol>$x</ol>";    # return whole thing
             !smegx;            # smeg -- how cool is that?

print $string;
[download]

which I was quite proud of -- but instead of what I thought I'd get:

blah blah blah before-list text
<ol>
<li>line one line 1 line one line 1 
<li>line 2 line two line 2 line two line 
<li>line number 3 line three line number 3 line three 
<li>line 4
</ol>
blah blah later after-list text
[download]

I got this:

blah blah blah before-list text
<ol><li>
<li>line one line 1 line one line 1 
<li>line 2 line two line 2 line two line 
<li>line number 3 line three line number 3 line three 
<li>line 4
</ol>
blah blah later after-list text
[download]

and I can't quite figure out where that extra <li> has come from. The next '^' isn't right after the last '$'?

I'm sure my fellow monks can tell me where I'm going wrong.

“Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
M-J D

Comment on automagic-HTML regex Select or Download Code

Replies are listed 'Best First'.
Re: automagic-HTML regex by sauoq (Abbot) on Jun 16, 2003 at 22:15 UTC
Your problem is in the very first line of that regex. The dollar in `^\[$` does not match a newline. It matches right before the newline. So, that newline goes into your `$1` captured by `(.)` with the /s modifier. In your second substitution (the one in the eval'd part of the first one), that caret (`^`) will match the begining of the first line in `$1` which, in your case, contains a single newline. You can fix it by telling your regex to match an opening brace and then match as much whitespace as possible. Try changing `^\[$` to `^\[\s`. -sauoq "My two cents aren't worth a dime.";	[reply] [d/l]
Re: Re: automagic-HTML regex by Enlil (Parson) on Jun 16, 2003 at 22:54 UTC
Try changing `^\[$` to `^\[\s`.* I don't know if the following is valid: `$string = ' [foobar] [ this is a list of things that are in a list ] blah blah later after-list text ';` [download] but my guess is that if it is valid then i doubt the change sauoq suggested will return the wanted results. what you probably want is to change `^\[$` to `^\[\s*?\n` (where then the [ can only be followed by "optional whitespace" and then a newline.) -enlil	[reply] [d/l] [select]
(jeffa) Re: automagic-HTML regex by jeffa (Bishop) on Jun 17, 2003 at 02:09 UTC
Couldn't resist a little TIMTOWTDI now that the problem has been solved. ;) `use CGI::Pretty qw(ol li); my $string = "yadda\n[yadda\nyadda]yadda\n"; $string =~ s/\[\s*([^\]]+)\]/ol li[split "\n+",$1]/eg;` [download] jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re: automagic-HTML regex by diotalevi (Canon) on Jun 16, 2003 at 22:39 UTC
`s{(?mx: # Explicitly set /mx for this scope. This puts the modifiers right + up where they are being used instead of at the end of the regex. ^\[$ ((?s:.*?)) # Non-greedy match of multiple lines ^\]$ )}{ my $tmp = $1; $tmp =~ s{^([^\n]+)$}{<li>$1</li>}mg; # Added /m and fixed closing + bracket. Preferred [^\n] to . for explictness. "<ul>$tmp</ul>" }eg;` [download]	[reply] [d/l]
Re: Re: automagic-HTML regex by Cody Pendant (Prior) on Jun 16, 2003 at 23:16 UTC
Thanks all of you. I think sauoq gets top points for explaining where my thinking had gone wrong, but all of the solutions seemed fine. In case you're not familiar with this kind of magic-first-char syntax, by the way, it's used in the online message board WebCrossing. It's known as "quick-edit" and allows you to write `this is i italic text` [download] or `this is b bold text` [download] or `this is > indented text` [download] in the textarea box, which saves time and is userfriendly for non-HTML people. I got so used to it I couldn't live without it and sometimes find myself doing it in other applications! I'm trying to "embrace and extend" the syntax for my own purposes. Theirs doesn't use multi-line syntax at all. “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply] [d/l] [select]
Re: automagic-HTML regex by artist (Parson) on Jun 16, 2003 at 22:10 UTC
`$x =~ s/^(?=.)/<li>/g; # LIs at the start of each line` Seems to to do the trick artist	[reply] [d/l]