regex exercise

Cilice has asked for the wisdom of the Perl Monks concerning the following question:

Suppose you have the following input file:

a 123 456 789
b 123 456 789

c 123 456 789
d 123 456 789

<p>e 123 456 789
f 123 456 789</p>

<p>e 123 456 789
f 123 456 789
g 123 456 789
h 123 456 789</p>

<p>1 123 456 789
2 123 456 789

3 123 456 789
4 123 456 789</p>
[download]

The following matches the last three paragraphs:
perl -p0e 's/(<p>(?:(?!<\/p>).)*<\/p>)/-->$1<--/migs' example.txt

You want to change the regex, to match the third paragraph only, because it contains an empty line.

Hint: the following does not work
perl -p0e 's/(<p>(?:(?!((<\/p>|\n\n))).)*<\/p>)/-->$1<--/migs' example.txt

Comment on regex exercise Select or Download Code

Replies are listed 'Best First'.
Re: regex exercise by choroba (Cardinal) on Aug 02, 2016 at 22:42 UTC
`/m` isn't needed, as you don't use `^` or `$` in the regex. `/i` isn't needed, either, as `<p>` is always lowercase in the sample data. I also used `s%%%` instead of `s///` to avoid the need to backslash the slashes in endtags. `s%(<p>(?=(?:(?!</p>).)\n\n)(?:(?!</p>).)</p>)%-->$1<--%sg` [download] Explanation: The p-tag must be followed by two newlines that aren't preceded by its endtag. You can make the regex more readable via `/x` : `s{ (<p> (?=(?:(?!</p>).)\n\n) # Followed by two newlines not precede +d by </p> (?:(?!</p>).)</p>) # Followed by </p> not preceded by </p +> }{-->$1<--}sgx;` [download] Update: or even `my $not_followed_by_end_p = qr{(?:(?!</p>).)}s; while (<>) { s{ (<p> (?=$not_followed_by_end_p\n\n) $not_followed_by_end_p</p>) }{-->$1<--}gx; print; }` [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7*2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re: regex exercise by hippo (Archbishop) on Aug 03, 2016 at 08:43 UTC
TIMTOWTDI: `perl -p0e 's/>(\S[^<]\n\n[^<])</-->$1<--/sg;' example.txt`	[reply] [d/l]
Re: regex exercise by Anonymous Monk on Aug 02, 2016 at 22:45 UTC
`#!/usr/bin/perl use strict; use warnings; $_ = join '', <DATA>; s/(<p>(?:(?!<\/p>).)<\/p>)(??{-1 == index $1, "\n\n" and '(FAIL)'})/ +-->$1<--/gs; print; __DATA__ a 123 456 789 b 123 456 789 c 123 456 789 d 123 456 789 <p>e 123 456 789 f 123 456 789</p> <p>e 123 456 789 f 123 456 789 g 123 456 789 h 123 456 789</p> <p>1 123 456 789 2 123 456 789 3 123 456 789 4 123 456 789</p>` [download]	[reply] [d/l]
Re^2: regex exercise by Anonymous Monk on Aug 02, 2016 at 23:26 UTC
Even this (they finally fixed the nested regex problem) `s/(<p>(?:(?!<\/p>).)<\/p>)(??{$1 !~ m#\n\n# and '(FAIL)'})/-->$1<--/ +gs;` [download]	[reply] [d/l]