G'day Xevven,
Welcome to the monastery.
"I think, this section is too complicated to match with RegExp, do you agree?"
No, I don't agree. On the basis of the data you've shown, this regex works just fine:
my $re = qr{ <div \s+ class="sectionHeading">.*?</div>\s+ <div \s+ class="sectionContent">.*?</div>\s+ }msx;
Here's my test:
#!/usr/bin/env perl use strict; use warnings; my $re = qr{ <div \s+ class="sectionHeading">.*?</div>\s+ <div \s+ class="sectionContent">.*?</div>\s+ }msx; my $html = do { local $/; <DATA> }; $html =~ s/$re//; print $html; __DATA__ <!-- KEEP --> <div class="sectionHeading">REMOVE_THIS</div> <div class="sectionContent"> <table class="sectionTable" ... ... </table> </div> <!-- KEEP -->
I added the <!-- KEEP --> comments as markers. I used all the <table>...</table> data exactly as you posted: I saw no reason to repeat it all again here.
Here's the output:
<!-- KEEP --> <!-- KEEP -->
-- Ken
In reply to Re: Remove section from a HTML file
by kcott
in thread Remove section from a HTML file
by Xevven
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |