Hi Monks,

I'm really desperate with the task to remove a section from a given bunch of HTML-files. I never worked with additional modules in Perl, but I guess, this time I can't avoid it ;-)

What I want to achieve:
* Remove a section from a html-file, but only from the one that has only one dot in its filename (match on abc.html but not on abc.html_aaa.html, abc.html_bbb.html in the same folder)

* The section looks like this:

<div class="sectionHeading">REMOVE_THIS</div> <div class="sectionContent"> <table class="sectionTable" border="0" cellspacing="0" cellpadding="0" + title="Properties" summary="Properties"> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_aaa bbb" abbr +="aaa bbb">aaa bbb</th><td class="sectionTableCell" align="left" head +ers="property_aaa bbb"><img width="20" height="15" alt="" title="" sr +c="./../../images/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_ccc ddd" abbr +="ccc ddd">ccc ddd</th><td class="sectionTableCell" align="left" head +ers="property_ccc ddd"><img width="20" height="15" alt="" title="" sr +c="./../../images/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_eee" abbr="ee +e">eee</th><td class="sectionTableCell" align="left" headers="propert +y_eee"><img width="20" height="15" alt="" title="" src="./../../image +s/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_fff" abbr="ff +f">fff</th><td class="sectionTableCell" align="left" headers="propert +y_fff"><img width="20" height="15" alt="" title="" src="./../../image +s/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_ggg" abbr="gg +g">ggg</th><td class="sectionTableCell" align="left" headers="propert +y_ggg"><img width="20" height="15" alt="Yes" title="Yes" src="./../.. +/images/true.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_hhh" abbr="hh +h">hhh</th><td class="sectionTableCell" align="left" headers="propert +y_hhh"><img width="20" height="15" alt="" title="" src="./../../image +s/indent.gif"></td> </tr> </table> </div>


* The section always starts with the two div-Containers, but the content in its inner table may vary (to be precise: only the referenced filenames './../../images/indent.gif' and './../../images/true.gif' vary).

I think, this section is too complicated to match with RegExp, do you agree?
Can I expect help from HTML::TokeParser or something similar?

Thanks for any helping hand :]
Cheers,
Xevven


In reply to Remove section from a HTML file by Xevven

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.