in reply to Remove section from a HTML file
G'day Xevven,
Welcome to the monastery.
"I think, this section is too complicated to match with RegExp, do you agree?"
No, I don't agree. On the basis of the data you've shown, this regex works just fine:
my $re = qr{ <div \s+ class="sectionHeading">.*?</div>\s+ <div \s+ class="sectionContent">.*?</div>\s+ }msx;
Here's my test:
#!/usr/bin/env perl use strict; use warnings; my $re = qr{ <div \s+ class="sectionHeading">.*?</div>\s+ <div \s+ class="sectionContent">.*?</div>\s+ }msx; my $html = do { local $/; <DATA> }; $html =~ s/$re//; print $html; __DATA__ <!-- KEEP --> <div class="sectionHeading">REMOVE_THIS</div> <div class="sectionContent"> <table class="sectionTable" ... ... </table> </div> <!-- KEEP -->
I added the <!-- KEEP --> comments as markers. I used all the <table>...</table> data exactly as you posted: I saw no reason to repeat it all again here.
Here's the output:
<!-- KEEP --> <!-- KEEP -->
-- Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Remove section from a HTML file
by Xevven (Initiate) on Oct 24, 2013 at 16:52 UTC |