deMize,

Try this code:

my $code = <<EOT; [head] Head text... [body] Body text... [something else] more text.. EOT $code =~ s/[\n ]+?(?=\S)//sg; #remove all newlines but last formatCode($code); sub formatCode { my $str = shift; $str =~ s{ \[ ([^\]]+) \] ([^\]]*) (?=\[|\n)} {<div class="$1">$2</div>}igx; print $str; }

I took your code, mixed in some of the suggestions of others, and made some changes I hope you find useful (since you asked). First, I changed the quoting to a "here doc", which is more like the file you'll probably be reading from in most situations.

Then I made a slight change to the way suggested below of removing the newlines, using a zero-width positive lookahead assertion (look up "?=" in perldoc perlre). The changes I made also accommodate blank lines in the source, which will help with readability.

Finally, in the regex, I used the /x modifier and curly braces to aid in the readability, and used a class, so you're now looking for "bracket, (not a bracket)+, bracket, (not a bracket)*, stop before a bracket".

Using "+" in the first capture and "*" in the second capture enforces an assumption (that you should think about and modify to suit) that brackets ALWAYS have a div name in them, but there might be no text after it. If there must always be text after a div name, use the "+" in both cases.

Since you are a professed newbie, I'll tell you that this is a common technique when you're learning. Later, you'll find it more practical (especially with large files) to process a stream, rather than manipulate a huge string. So, I'll leave you with this:

my $code = <<EOT; [head] Head text... [body] Body text... [something else] more text.. EOT $prev_header = 0; # takes care of closing divs for (split /\n/, $code) { # No need to remove newlines if (/^\s*$/) { next } # Skip blank lines elsif ( /^\[ ([^\]]+) \]/x ) { # If line is a header... print "</div>" if $prev_header; # close prev div, if there w +as one... print qq(<div class="$1">); # print current div... $prev_header = 1; # and set current div flag. } else { print "$1 " } # Simply print non-header line +s } print "</div>"; # You always have to close last div, so just do it her +e

To read from a file:

open, $FH, "<whatever.txt"; for (<$FH>) { chomp; ... }

In reply to Re: Recursive Regex by furry_marmot
in thread Parsing using Regex and Lookahead by deMize

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.