Re: Remove section from a HTML file

Welcome to the monastery.

"I think, this section is too complicated to match with RegExp, do you agree?"

No, I don't agree. On the basis of the data you've shown, this regex works just fine:

my $re = qr{
    <div \s+ class="sectionHeading">.*?</div>\s+
    <div \s+ class="sectionContent">.*?</div>\s+
}msx;
[download]

Here's my test:

#!/usr/bin/env perl

use strict;
use warnings;

my $re = qr{
    <div \s+ class="sectionHeading">.*?</div>\s+
    <div \s+ class="sectionContent">.*?</div>\s+
}msx;

my $html = do { local $/; <DATA> };

$html =~ s/$re//;

print $html;

__DATA__
<!-- KEEP -->
<div class="sectionHeading">REMOVE_THIS</div>
<div class="sectionContent">
<table class="sectionTable" ...
...
</table>
</div>
<!-- KEEP -->
[download]

I added the  comments as markers. I used all the <table>...</table> data exactly as you posted: I saw no reason to repeat it all again here.

Here's the output:

<!-- KEEP -->
<!-- KEEP -->
[download]

-- Ken

Comment on Re: Remove section from a HTML file Select or Download Code

Replies are listed 'Best First'.
Re^2: Remove section from a HTML file by Xevven (Initiate) on Oct 24, 2013 at 16:52 UTC
Thank you very much, this is indeed working as expected, even if I put in a complete real-world file in the __DATA__ section ;-) I tried to alter the script, so that i modifies all of the apropriate files. For testing purposes, I tried to match the files and output there modified content. It seems, that this approach eliminates all line-breaks. Output is all in a single line. Can some one help me out, where my error is ? ;-) Cheers, Xevven `#!/usr/bin/env perl use strict; use warnings; my $re = qr{ <div \s+ class="sectionHeading">REMOVE_THIS.?</div>\s+ <div \s+ class="sectionContent">.?</div>\s+ }msx; #my $html = do { local $/; <DATA> }; #$html =~ s/$re//; opendir(my $dh, ".") or die "$!"; my @files = grep { s/\././g < 2 } <*.html>; closedir $dh; for my $file (@files) { local $/ = undef; open my $fh, "<", $file or die "$!"; my $content = <$fh>; $content =~ s/$re//; print $content; close $fh; }` [download]	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Remove section from a HTML file
by Xevven (Initiate) on Oct 24, 2013 at 16:52 UTC

#!/usr/bin/env perl

use strict;
use warnings;

my $re = qr{
    <div \s+ class="sectionHeading">REMOVE_THIS.*?</div>\s+
    <div \s+ class="sectionContent">.*?</div>\s+
}msx;

#my $html = do { local $/; <DATA> };

#$html =~ s/$re//;

opendir(my $dh, ".") or die "$!";
my @files = grep { s/\././g < 2 } <*.html>;
closedir $dh;

for my $file (@files) {
    local $/ = undef;
    open my $fh, "<", $file or die "$!";
    my $content = <$fh>;
    $content =~ s/$re//;
    print $content;
    close $fh;
}
[download]

[reply]
[d/l]