Here's a way to do it by subclassing HTML::Parser. Even though it is not well documented, I like the fine-grained control this technique allows.
use strict; use warnings; my $html = q| <h1>blah</h1> <p>blah<p> <h2>blah</h2> <p>blah</p> <h3>blah</h3> <p>blah</p> <h2>blah blah blah</h2> <p>blah</p> <h1>blah</h1> <h2>blah</h2> <h2>blah</h2>|; my $parser = Markdent_Parser->new(); $parser->parse($html); $parser->eof; print $parser->out; package Markdent_Parser; use parent qw(HTML::Parser); sub start { my ($self,$tag,$attr,$attrseq,$text) = @_; if ($tag eq 'h1' and $self->{'in_h2'}) { $self->{'out'} .= "</div>\n\n"; $self->{'in_h2'} = 0; } elsif ($tag eq 'h2') { if ($self->{'in_h2'}) { $self->{'out'} .= "</div>\n"; } $self->{'out'} .= "\n<div>\n"; $self->{'in_h2'} = 1; } $self->{'out'} .= $text; } sub text { my ($self,$text) = @_; $self->{'out'} .= $text; } sub end { my ($self,$tag,$text) = @_; $self->{'out'} .= $text; } sub out { my ($self) = @_; if ($self->{'in_h2'}) { $self->{'out'} .= "\n</div>"; } return $self->{'out'}; } 1;
The output is as described by your algorithm rather than as shown by your example (I fixed the typo in your input also).
<h1>blah</h1> <p>blah<p> <div> <h2>blah</h2> <p>blah</p> <h3>blah</h3> <p>blah</p> </div> <div> <h2>blah blah blah</h2> <p>blah</p> </div> <h1>blah</h1> <div> <h2>blah</h2> </div> <div> <h2>blah</h2> </div>

In reply to Re: Wrapping HTML "sections" with a div by tangent
in thread Wrapping HTML "sections" with a div by nysus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.