Purists are cringing at your apparent belief that <p> marks the end of a paragraph. It marks the beginning of a paragraph, which is then terminated by </p>. Your confusion is widespread and pardonable, because the terminal </p> is optional, and your orphan line at the beginning will usually be rendered exactly like a paragraph.

So here's how to do what you are trying to do:

s/(<pre>\n(?:[^\n]*<p>\n)*)([^>\n]*)\n(.*?<\/pre>)/$1$2<p>\n$3/ms

This assumes, as you do, that the opening <pre> is on a line of its own. I further assume that you start with no markup of any kind in your <pre> block. The substitution puts <p> at the end of each line that doesn't yet contain markup.

I think my attempt may be the kind of thing you're looking for, but you may find further problems with this approach. Before you spend too much more time on this regex, I'd advise you to either process the file line-by-line (as you're already thinking of doing), or better yet, drop regexes altogether and learn about parsers.


In reply to Re: Substitution inside tags, as 1 line by Narveson
in thread Substitution inside tags, as 1 line by tel2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.