in reply to Parsing using Regex and Lookahead
Try this code:
my $code = <<EOT; [head] Head text... [body] Body text... [something else] more text.. EOT $code =~ s/[\n ]+?(?=\S)//sg; #remove all newlines but last formatCode($code); sub formatCode { my $str = shift; $str =~ s{ \[ ([^\]]+) \] ([^\]]*) (?=\[|\n)} {<div class="$1">$2</div>}igx; print $str; }
I took your code, mixed in some of the suggestions of others, and made some changes I hope you find useful (since you asked). First, I changed the quoting to a "here doc", which is more like the file you'll probably be reading from in most situations.
Then I made a slight change to the way suggested below of removing the newlines, using a zero-width positive lookahead assertion (look up "?=" in perldoc perlre). The changes I made also accommodate blank lines in the source, which will help with readability.
Finally, in the regex, I used the /x modifier and curly braces to aid in the readability, and used a class, so you're now looking for "bracket, (not a bracket)+, bracket, (not a bracket)*, stop before a bracket".
Using "+" in the first capture and "*" in the second capture enforces an assumption (that you should think about and modify to suit) that brackets ALWAYS have a div name in them, but there might be no text after it. If there must always be text after a div name, use the "+" in both cases.
Since you are a professed newbie, I'll tell you that this is a common technique when you're learning. Later, you'll find it more practical (especially with large files) to process a stream, rather than manipulate a huge string. So, I'll leave you with this:
my $code = <<EOT; [head] Head text... [body] Body text... [something else] more text.. EOT $prev_header = 0; # takes care of closing divs for (split /\n/, $code) { # No need to remove newlines if (/^\s*$/) { next } # Skip blank lines elsif ( /^\[ ([^\]]+) \]/x ) { # If line is a header... print "</div>" if $prev_header; # close prev div, if there w +as one... print qq(<div class="$1">); # print current div... $prev_header = 1; # and set current div flag. } else { print "$1 " } # Simply print non-header line +s } print "</div>"; # You always have to close last div, so just do it her +e
To read from a file:
open, $FH, "<whatever.txt"; for (<$FH>) { chomp; ... }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Recursive Regex: Response
by deMize (Monk) on Mar 11, 2009 at 19:17 UTC | |
by deMize (Monk) on Mar 11, 2009 at 20:22 UTC | |
by furry_marmot (Pilgrim) on Mar 19, 2009 at 20:26 UTC | |
|
Re^2: Recursive Regex
by deMize (Monk) on Mar 12, 2009 at 03:15 UTC | |
by deMize (Monk) on Mar 12, 2009 at 14:16 UTC | |
by furry_marmot (Pilgrim) on Mar 19, 2009 at 21:02 UTC |