in reply to Delete multiple lines of text from a file?
First, welcome! You've set yourself a real challenge... and come to a good place for ideas.
Here's one (others will likely offer better ways; particularly those founded on the advice that one should use a well-tested html parser rather than regexen).
Nonetheless, in this limited case, one might consider using a match clause with minimally greedy matching (see perlretut) and a substitution clause invoked on the match. But do this ONLY IF YOU ARE CERTAIN that none of the <div class="topsearchbar"> will contain any other <div>...</div> that you might want to retain.
#!/usr/bin/perl use strict; use warnings; #824315 my $html; { local undef $/; # slurp the data (file for your application) $html = <DATA>; } my $delete = "<div class=\"topsearchbar\">.*?</div>"; my $tobedeleted; if ( $html =~ m|($delete)|s ) { $tobedeleted = $1; print "\n---------\n \$tobedeleted: $tobedeleted\n--------\n\n"; # above for info only; remove for production $html =~ s|$tobedeleted| |; } else { print "\n No match for class topsearchbar \n"; } print $html; __DATA__ <html> <head> <title>something</title> </head> <body> <h1>Headline above search bar</h1> <div class="topsearchbar"> <a href="foo.htm">foo</a> <a href="bar.shtml">bar</a> <img src="logo.png width="240" height="110" alt="logo for xyz corp"> <a href="baz.htm">baz</a> </div> <p>more stuff</p> <div class="somethingelse"> <p>stuff</p> </div> </body> </html>
Output:
--------- $tobedeleted: <div class="topsearchbar"> <a href="foo.htm">foo</a> <a href="bar.shtml">bar</a> <img src="logo.png width="240" height="110" alt="logo for xyz corp"> <a href="baz.htm">baz</a> </div> -------- <html> <head> <title>something</title> </head> <body> <h1>Headline above search bar</h1> <p>more stuff</p> <div class="somethingelse"> <p>stuff</p> </div> </body> </html>
You'll want to put the names of the files you want to modify in an array, and loop over that, rather than using __DATA__ as this example does, and -- of course, to rename originals to ".bak" before saving the output to the original name, but you seem to have that well under control.
And, of course, the print command which produces the info section (between the dashed lines) is not for production; solely for illustration here.
|
|---|