Hi Monks,
I'm really desperate with the task to remove a section from a given bunch of HTML-files. I never worked with additional modules in Perl, but I guess, this time I can't avoid it ;-)
What I want to achieve:
* Remove a section from a html-file, but only from the one that has only one dot in its filename (match on abc.html but not on abc.html_aaa.html, abc.html_bbb.html in the same folder)
* The section looks like this:
<div class="sectionHeading">REMOVE_THIS</div> <div class="sectionContent"> <table class="sectionTable" border="0" cellspacing="0" cellpadding="0" + title="Properties" summary="Properties"> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_aaa bbb" abbr +="aaa bbb">aaa bbb</th><td class="sectionTableCell" align="left" head +ers="property_aaa bbb"><img width="20" height="15" alt="" title="" sr +c="./../../images/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_ccc ddd" abbr +="ccc ddd">ccc ddd</th><td class="sectionTableCell" align="left" head +ers="property_ccc ddd"><img width="20" height="15" alt="" title="" sr +c="./../../images/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_eee" abbr="ee +e">eee</th><td class="sectionTableCell" align="left" headers="propert +y_eee"><img width="20" height="15" alt="" title="" src="./../../image +s/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_fff" abbr="ff +f">fff</th><td class="sectionTableCell" align="left" headers="propert +y_fff"><img width="20" height="15" alt="" title="" src="./../../image +s/indent.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_ggg" abbr="gg +g">ggg</th><td class="sectionTableCell" align="left" headers="propert +y_ggg"><img width="20" height="15" alt="Yes" title="Yes" src="./../.. +/images/true.gif"></td> </tr> <tr valign="top"> <th class="sectionTableHeading" scope="row" id="property_hhh" abbr="hh +h">hhh</th><td class="sectionTableCell" align="left" headers="propert +y_hhh"><img width="20" height="15" alt="" title="" src="./../../image +s/indent.gif"></td> </tr> </table> </div>
* The section always starts with the two div-Containers, but the content in its inner table may vary (to be precise: only the referenced filenames './../../images/indent.gif' and './../../images/true.gif' vary).
I think, this section is too complicated to match with RegExp, do you agree?
Can I expect help from HTML::TokeParser or something similar?
Thanks for any helping hand :]
Cheers,
Xevven
In reply to Remove section from a HTML file by Xevven
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |