Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I was thinking about ways to do HTML shortcuts for my blog, and I came up with this idea -- any block of text enclosed by [ and ] on a line by themselves automagically becomes an ordered list.

So I come up with this:

$string = ' blah blah blah before-list text [ line one line 1 line one line 1 line 2 line two line 2 line two line line number 3 line three line number 3 line three line 4 ] blah blah later after-list text '; $string =~ s! ^\[$ # open-bracket as entire line (.*) # all including linebreaks - /s modifier ^\]$ # close-bracket as entire line !$x=$1; $x =~ s/^/<li>/g; # LIs at the start of each line "<ol>$x</ol>"; # return whole thing !smegx; # smeg -- how cool is that? print $string;
which I was quite proud of -- but instead of what I thought I'd get:
blah blah blah before-list text <ol> <li>line one line 1 line one line 1 <li>line 2 line two line 2 line two line <li>line number 3 line three line number 3 line three <li>line 4 </ol> blah blah later after-list text
I got this:
blah blah blah before-list text <ol><li> <li>line one line 1 line one line 1 <li>line 2 line two line 2 line two line <li>line number 3 line three line number 3 line three <li>line 4 </ol> blah blah later after-list text
and I can't quite figure out where that extra <li> has come from. The next '^' isn't right after the last '$'?

I'm sure my fellow monks can tell me where I'm going wrong.



“Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
M-J D

Replies are listed 'Best First'.
Re: automagic-HTML regex
by sauoq (Abbot) on Jun 16, 2003 at 22:15 UTC

    Your problem is in the very first line of that regex. The dollar in ^\[$ does not match a newline. It matches right before the newline. So, that newline goes into your $1 captured by (.*) with the /s modifier. In your second substitution (the one in the eval'd part of the first one), that caret (^) will match the begining of the first line in $1 which, in your case, contains a single newline.

    You can fix it by telling your regex to match an opening brace and then match as much whitespace as possible. Try changing ^\[$ to ^\[\s*.

    -sauoq
    "My two cents aren't worth a dime.";
    
      Try changing ^\[$ to ^\[\s*.

      I don't know if the following is valid:

      $string = ' [foobar] [ this is a list of things that are in a list ] blah blah later after-list text ';
      but my guess is that if it is valid then i doubt the change sauoq suggested will return the wanted results. what you probably want is to change ^\[$ to ^\[\s*?\n (where then the [ can only be followed by "optional whitespace" and then a newline.)

      -enlil

(jeffa) Re: automagic-HTML regex
by jeffa (Bishop) on Jun 17, 2003 at 02:09 UTC
    Couldn't resist a little TIMTOWTDI now that the problem has been solved. ;)
    use CGI::Pretty qw(ol li); my $string = "yadda\n[yadda\nyadda]yadda\n"; $string =~ s/\[\s*([^\]]+)\]/ol li[split "\n+",$1]/eg;

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: automagic-HTML regex
by diotalevi (Canon) on Jun 16, 2003 at 22:39 UTC
    s{(?mx: # Explicitly set /mx for this scope. This puts the modifiers right + up where they are being used instead of at the end of the regex. ^\[$ ((?s:.*?)) # Non-greedy match of multiple lines ^\]$ )}{ my $tmp = $1; $tmp =~ s{^([^\n]+)$}{<li>$1</li>}mg; # Added /m and fixed closing + bracket. Preferred [^\n] to . for explictness. "<ul>$tmp</ul>" }eg;
      Thanks all of you. I think sauoq gets top points for explaining where my thinking had gone wrong, but all of the solutions seemed fine.

      In case you're not familiar with this kind of magic-first-char syntax, by the way, it's used in the online message board WebCrossing.

      It's known as "quick-edit" and allows you to write

      this is i italic text
      or
      this is b bold text
      or
      this is > indented text
      in the textarea box, which saves time and is userfriendly for non-HTML people. I got so used to it I couldn't live without it and sometimes find myself doing it in other applications!

      I'm trying to "embrace and extend" the syntax for my own purposes. Theirs doesn't use multi-line syntax at all.



      “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
      M-J D
Re: automagic-HTML regex
by artist (Parson) on Jun 16, 2003 at 22:10 UTC
                  $x =~ s/^(?=.)/<li>/g; # LIs at the start of each line
    Seems to to do the trick

    artist