in reply to Remove section from a HTML file

G'day Xevven,

Welcome to the monastery.

"I think, this section is too complicated to match with RegExp, do you agree?"

No, I don't agree. On the basis of the data you've shown, this regex works just fine:

my $re = qr{ <div \s+ class="sectionHeading">.*?</div>\s+ <div \s+ class="sectionContent">.*?</div>\s+ }msx;

Here's my test:

#!/usr/bin/env perl use strict; use warnings; my $re = qr{ <div \s+ class="sectionHeading">.*?</div>\s+ <div \s+ class="sectionContent">.*?</div>\s+ }msx; my $html = do { local $/; <DATA> }; $html =~ s/$re//; print $html; __DATA__ <!-- KEEP --> <div class="sectionHeading">REMOVE_THIS</div> <div class="sectionContent"> <table class="sectionTable" ... ... </table> </div> <!-- KEEP -->

I added the <!-- KEEP --> comments as markers. I used all the <table>...</table> data exactly as you posted: I saw no reason to repeat it all again here.

Here's the output:

<!-- KEEP --> <!-- KEEP -->

-- Ken

Replies are listed 'Best First'.
Re^2: Remove section from a HTML file
by Xevven (Initiate) on Oct 24, 2013 at 16:52 UTC
    Thank you very much, this is indeed working as expected, even if I put in a complete real-world file in the __DATA__ section ;-) I tried to alter the script, so that i modifies all of the apropriate files. For testing purposes, I tried to match the files and output there modified content. It seems, that this approach eliminates all line-breaks. Output is all in a single line. Can some one help me out, where my error is ? ;-) Cheers, Xevven
    #!/usr/bin/env perl use strict; use warnings; my $re = qr{ <div \s+ class="sectionHeading">REMOVE_THIS.*?</div>\s+ <div \s+ class="sectionContent">.*?</div>\s+ }msx; #my $html = do { local $/; <DATA> }; #$html =~ s/$re//; opendir(my $dh, ".") or die "$!"; my @files = grep { s/\././g < 2 } <*.html>; closedir $dh; for my $file (@files) { local $/ = undef; open my $fh, "<", $file or die "$!"; my $content = <$fh>; $content =~ s/$re//; print $content; close $fh; }