Re: How do we remove specific HTML tag

I don't know about a completely general HTML solution because I am not an HTML expert. However, it could be that something simple would work ok? Here is some code that stops printing <nav sections after it has seen the first one. You could adapt this to your desired nth parameter functionality.

use strict;
use warnings;

my $nav_seen =0;

while (<DATA>) 
{
   # if inside of <nav> section, print it
   # unless we have seen a <nav> section before
   
   if (my $status = /<nav/ ... /<\/nav/) 
   { 
      print unless $nav_seen;
      $nav_seen++ if $status =~ /E/;
   }
   else {print}    
}
=PRINTOUT
<body>   

<nav a=b>   

 <div>   
 </div>   
</nav>   
  
<div>   
</div>   


</body> 

=cut

__DATA__
<body>   

<nav a=b>   

 <div>   
 </div>   
</nav>   
  
<div>   
</div>   

<nav c=d>    
  <li> </li>      
</nav>   

</body>
[download]

To understand how this works, I direct you to Flipin good, or a total flop?.

Comment on Re: How do we remove specific HTML tag Download Code

Replies are listed 'Best First'.
Re^2: How do we remove specific HTML tag by haukex (Archbishop) on Nov 07, 2021 at 05:36 UTC
What'd be reliable perl lib / module ... ... it could be that something simple would work ok? No. Why a regex really isn't good enough for HTML and XML, even for "simple" tasks.	[reply]
Re^3: How do we remove specific HTML tag by Marshall (Canon) on Nov 07, 2021 at 07:16 UTC
We don't really have any idea of how general purpose that the OP's function needs to be. The OP's test input is very simple and doesn't demo anything complex. It would be appropriate for the OP to post an extended test case. I like your link+ and the discussion therein. I certainly don't propose my simple code to be anything other than perhaps a "hack" to deal with one particular webpage.	[reply]
Re^4: How do we remove specific HTML tag by Fletch (Bishop) on Nov 07, 2021 at 09:38 UTC
That's kind of the point. He might have matching cruft in a CDATA section, or (more likely) inside a comment because the web designer decided to move something around but left the old location in place for reference and never cleaned up afterwards. You've handed him a ticking bomb prossibly starting him off with a bad habit and sooner than later that's going to go boom.[1] You don't know what his actual data is so the best answer is the most generally correct one: don't try and wing it handling HTML with regexen, use a proper parser. [1] – "No boom today. Boom Tomorrow. There's always a boom tomorrow." The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re^5: How do we remove specific HTML tag by Bod (Parson) on Nov 07, 2021 at 12:58 UTC
Re^6: How do we remove specific HTML tag by marto (Cardinal) on Nov 07, 2021 at 13:55 UTC


We don't bite newbies here... much
	PerlMonks