First, welcome! You've set yourself a real challenge... and come to a good place for ideas.

Here's one (others will likely offer better ways; particularly those founded on the advice that one should use a well-tested html parser rather than regexen).

Nonetheless, in this limited case, one might consider using a match clause with minimally greedy matching (see perlretut) and a substitution clause invoked on the match. But do this ONLY IF YOU ARE CERTAIN that none of the <div class="topsearchbar"> will contain any other <div>...</div> that you might want to retain.

#!/usr/bin/perl use strict; use warnings; #824315 my $html; { local undef $/; # slurp the data (file for your application) $html = <DATA>; } my $delete = "<div class=\"topsearchbar\">.*?</div>"; my $tobedeleted; if ( $html =~ m|($delete)|s ) { $tobedeleted = $1; print "\n---------\n \$tobedeleted: $tobedeleted\n--------\n\n"; # above for info only; remove for production $html =~ s|$tobedeleted| |; } else { print "\n No match for class topsearchbar \n"; } print $html; __DATA__ <html> <head> <title>something</title> </head> <body> <h1>Headline above search bar</h1> <div class="topsearchbar"> <a href="foo.htm">foo</a> <a href="bar.shtml">bar</a> <img src="logo.png width="240" height="110" alt="logo for xyz corp"> <a href="baz.htm">baz</a> </div> <p>more stuff</p> <div class="somethingelse"> <p>stuff</p> </div> </body> </html>

Output:

--------- $tobedeleted: <div class="topsearchbar"> <a href="foo.htm">foo</a> <a href="bar.shtml">bar</a> <img src="logo.png width="240" height="110" alt="logo for xyz corp"> <a href="baz.htm">baz</a> </div> -------- <html> <head> <title>something</title> </head> <body> <h1>Headline above search bar</h1> <p>more stuff</p> <div class="somethingelse"> <p>stuff</p> </div> </body> </html>

You'll want to put the names of the files you want to modify in an array, and loop over that, rather than using __DATA__ as this example does, and -- of course, to rename originals to ".bak" before saving the output to the original name, but you seem to have that well under control.

And, of course, the print command which produces the info section (between the dashed lines) is not for production; solely for illustration here.


In reply to Re: Delete multiple lines of text from a file? by ww
in thread Delete multiple lines of text from a file? by Erika

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.