Cilice has asked for the wisdom of the Perl Monks concerning the following question:

Suppose you have the following input file:
a 123 456 789 b 123 456 789 c 123 456 789 d 123 456 789 <p>e 123 456 789 f 123 456 789</p> <p>e 123 456 789 f 123 456 789 g 123 456 789 h 123 456 789</p> <p>1 123 456 789 2 123 456 789 3 123 456 789 4 123 456 789</p>

The following matches the last three paragraphs:
perl -p0e 's/(<p>(?:(?!<\/p>).)*<\/p>)/-->$1<--/migs' example.txt

You want to change the regex, to match the third paragraph only, because it contains an empty line.

Hint: the following does not work
perl -p0e 's/(<p>(?:(?!((<\/p>|\n\n))).)*<\/p>)/-->$1<--/migs' example.txt

Replies are listed 'Best First'.
Re: regex exercise
by choroba (Cardinal) on Aug 02, 2016 at 22:42 UTC
    /m isn't needed, as you don't use ^ or $ in the regex. /i isn't needed, either, as <p> is always lowercase in the sample data.

    I also used s%%% instead of s/// to avoid the need to backslash the slashes in endtags.

    s%(<p>(?=(?:(?!</p>).)*\n\n)(?:(?!</p>).)*</p>)%-->$1<--%sg

    Explanation:

    The p-tag must be followed by two newlines that aren't preceded by its endtag.

    You can make the regex more readable via /x :

    s{ (<p> (?=(?:(?!</p>).)*\n\n) # Followed by two newlines not precede +d by </p> (?:(?!</p>).)*</p>) # Followed by </p> not preceded by </p +> }{-->$1<--}sgx;

    Update: or even

    my $not_followed_by_end_p = qr{(?:(?!</p>).)*}s; while (<>) { s{ (<p> (?=$not_followed_by_end_p\n\n) $not_followed_by_end_p</p>) }{-->$1<--}gx; print; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: regex exercise
by hippo (Archbishop) on Aug 03, 2016 at 08:43 UTC

    TIMTOWTDI:

    perl -p0e 's/>(\S[^<]*\n\n[^<]*)</-->$1<--/sg;' example.txt
Re: regex exercise
by Anonymous Monk on Aug 02, 2016 at 22:45 UTC
    #!/usr/bin/perl use strict; use warnings; $_ = join '', <DATA>; s/(<p>(?:(?!<\/p>).)*<\/p>)(??{-1 == index $1, "\n\n" and '(*FAIL)'})/ +-->$1<--/gs; print; __DATA__ a 123 456 789 b 123 456 789 c 123 456 789 d 123 456 789 <p>e 123 456 789 f 123 456 789</p> <p>e 123 456 789 f 123 456 789 g 123 456 789 h 123 456 789</p> <p>1 123 456 789 2 123 456 789 3 123 456 789 4 123 456 789</p>

      Even this (they finally fixed the nested regex problem)

      s/(<p>(?:(?!<\/p>).)*<\/p>)(??{$1 !~ m#\n\n# and '(*FAIL)'})/-->$1<--/ +gs;