comment on

deMize,

Try this code:

my $code = <<EOT;
[head]
Head text...

[body]
Body text...

[something else]
more text..
EOT

$code =~ s/[\n ]+?(?=\S)//sg; #remove all newlines but last

formatCode($code);

sub formatCode {
    my $str = shift;

    $str =~ s{ \[ ([^\]]+) \] ([^\]]*) (?=\[|\n)}
             {<div class="$1">$2</div>}igx;

    print $str;
}
[download]

I took your code, mixed in some of the suggestions of others, and made some changes I hope you find useful (since you asked). First, I changed the quoting to a "here doc", which is more like the file you'll probably be reading from in most situations.

Then I made a slight change to the way suggested below of removing the newlines, using a zero-width positive lookahead assertion (look up "?=" in perldoc perlre). The changes I made also accommodate blank lines in the source, which will help with readability.

Finally, in the regex, I used the /x modifier and curly braces to aid in the readability, and used a class, so you're now looking for "bracket, (not a bracket)+, bracket, (not a bracket)*, stop before a bracket".

Using "+" in the first capture and "*" in the second capture enforces an assumption (that you should think about and modify to suit) that brackets ALWAYS have a div name in them, but there might be no text after it. If there must always be text after a div name, use the "+" in both cases.

Since you are a professed newbie, I'll tell you that this is a common technique when you're learning. Later, you'll find it more practical (especially with large files) to process a stream, rather than manipulate a huge string. So, I'll leave you with this:

my $code = <<EOT;
[head]
Head text...

[body]
Body text...

[something else]
more text..
EOT

$prev_header = 0; # takes care of closing divs
for (split /\n/, $code) {               # No need to remove newlines
    if (/^\s*$/) { next }               # Skip blank lines

    elsif ( /^\[ ([^\]]+) \]/x ) {      # If line is a header...
        print "</div>" if $prev_header; #   close prev div, if there w
+as one...
        print qq(<div class="$1">);     #   print current div...
        $prev_header = 1;               #   and set current div flag.
    }

    else { print "$1 " }                # Simply print non-header line
+s
}
print "</div>"; # You always have to close last div, so just do it her
+e
[download]

To read from a file:

open, $FH, "<whatever.txt";
for (<$FH>) {
    chomp;
    ...
}
[download]

In reply to Re: Recursive Regex by furry_marmot
in thread Parsing using Regex and Lookahead by deMize

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.