Try this code:
my $code = <<EOT; [head] Head text... [body] Body text... [something else] more text.. EOT $code =~ s/[\n ]+?(?=\S)//sg; #remove all newlines but last formatCode($code); sub formatCode { my $str = shift; $str =~ s{ \[ ([^\]]+) \] ([^\]]*) (?=\[|\n)} {<div class="$1">$2</div>}igx; print $str; }
I took your code, mixed in some of the suggestions of others, and made some changes I hope you find useful (since you asked). First, I changed the quoting to a "here doc", which is more like the file you'll probably be reading from in most situations.
Then I made a slight change to the way suggested below of removing the newlines, using a zero-width positive lookahead assertion (look up "?=" in perldoc perlre). The changes I made also accommodate blank lines in the source, which will help with readability.
Finally, in the regex, I used the /x modifier and curly braces to aid in the readability, and used a class, so you're now looking for "bracket, (not a bracket)+, bracket, (not a bracket)*, stop before a bracket".
Using "+" in the first capture and "*" in the second capture enforces an assumption (that you should think about and modify to suit) that brackets ALWAYS have a div name in them, but there might be no text after it. If there must always be text after a div name, use the "+" in both cases.
Since you are a professed newbie, I'll tell you that this is a common technique when you're learning. Later, you'll find it more practical (especially with large files) to process a stream, rather than manipulate a huge string. So, I'll leave you with this:
my $code = <<EOT; [head] Head text... [body] Body text... [something else] more text.. EOT $prev_header = 0; # takes care of closing divs for (split /\n/, $code) { # No need to remove newlines if (/^\s*$/) { next } # Skip blank lines elsif ( /^\[ ([^\]]+) \]/x ) { # If line is a header... print "</div>" if $prev_header; # close prev div, if there w +as one... print qq(<div class="$1">); # print current div... $prev_header = 1; # and set current div flag. } else { print "$1 " } # Simply print non-header line +s } print "</div>"; # You always have to close last div, so just do it her +e
To read from a file:
open, $FH, "<whatever.txt"; for (<$FH>) { chomp; ... }
In reply to Re: Recursive Regex
by furry_marmot
in thread Parsing using Regex and Lookahead
by deMize
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |