in reply to Re: Recursive Regex
in thread Parsing using Regex and Lookahead

Just so everyone knows, some changes needed to be made:

The updated routine:
sub formatCode{ my $code = shift; my $prev_header = 0; # takes care of closing + divs for (split /\n/, $code) { # No need to remove new +lines if (/^\s*$/) { next } # Skip blank lines elsif ( /^(.*?)\[ ([^\]]*) \](.*)/x ) { # If line is a h +eader... print "$1</div>" if $prev_header; # close prev div, if +there was one... print qq(<div class="$2">$3); # print current div.. +. $prev_header = 1; # and set current div + flag. } else { print "$_<br />" } # Simply print non-head +er lines } print "</div>" if $prev_header; # You always have to cl +ose last div, so just do it here }

Basically $1 needed to be changed to $_. Also, I added matches for before and after the tag, in case there was inline text. Finally, the line break was added in replace of a newline character. I'm still curious if $prev_header should be reset to 0 after the close div has been called. I guess it's not necessary.


Replies are listed 'Best First'.
Re^3: Recursive Regex
by deMize (Monk) on Mar 12, 2009 at 14:16 UTC
    Update:
    Looking at it now, this is not going to work.
    If I wish to have all the input on one line, I'd still need the look ahead. For example, the above solution will not fix multiple inline statements [head]Title Text[body]Blah Blah Blah

    In replace of the lookahead, I can think of two simple solutions using split: I'd either need to first loop through the string and place an inline character before each [\w*] pattern, or I can delimit on the pattern itself.

    So this is what I came up with:
    sub formatCode2{ my $code = shift; my @arrCode = split (/\[([^\]]*)\]/, $code); my $size = @arrCode; # print whatever b4 1st delimiter if (@arrCode >= 1) { $_ = $arrCode[0]; s/\n/<br \/>/ig; print qq(<div class="">$_</div>); } # print sections for (my $cnt = 1; $cnt < @arrCode; $cnt+=2){ $_ = $arrCode[$cnt+1]; s/\n/<br \/>/ig; print qq(<div class="$arrCode[$cnt]">$_</div>); } }

      Maybe I'm missing something, but why do you want the input all on one line? It seems to me that you're creating the problem you're trying to solve when there wasn't a problem in the first place.

      If you put multiple inline statements on separate lines, you wouldn't have the problem. Since this already resembles a .INI format, why not just take it a little further:

      [section] head=Title Text Blah Blah Blah <--body text

      Specific items (like head, div, section) are called out in some easily parseable fashion, and plain text defaults to body text. Running it all together as one line just makes a huge parsing problem. But in something like the above example, you don't need split() or lookaheads and the parsing is trivial.

      my 2 cents

      --marmot