abdan has asked for the wisdom of the Perl Monks concerning the following question:

What'd be reliable perl lib / module to remove a certain html element located by its nth order from the top/start of file, e.g:
<body> <nav a=b> <div> </div> </nav> <div> </div> <nav c=d> <li> </li> </nav> </body>
so if input to that module function 1,nav will remove only first encountered <nav> ...</nav> wholly or if input 2,nav will remove only second encountered <nav> ...</nav> wholly etc

Replies are listed 'Best First'.
Re: How do we remove specific HTML tag
by haukex (Archbishop) on Nov 07, 2021 at 05:35 UTC
    a certain html element located by its nth order from the top/start of file

    The CSS :nth-of-type() selector might be what you are looking for, which is supported by Mojo::DOM::CSS; try changing the 1 to a 2 in the following to see the effect:

    use Mojo::DOM; my $dom = Mojo::DOM->new($html); $dom->at('nav:nth-of-type(1)')->remove; print $dom->to_string;
Re: How do we remove specific HTML tag
by choroba (Cardinal) on Nov 07, 2021 at 21:40 UTC
    XML::LibXML can open HTML.
    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $html = 'XML::LibXML'->load_html(location => 'file.html', recover => 1); my $nav2 = $html->find('(//nav)[2]')->[0]; $nav2->parentNode->removeChild($nav2); print $html->toString; # c="d" is gone.

    Or, more succinctly in XML::XSH2:

    open :r :F html file.html ; rm (//nav)[2] ; ls / ;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: How do we remove specific HTML tag
by Marshall (Canon) on Nov 07, 2021 at 03:16 UTC
    I don't know about a completely general HTML solution because I am not an HTML expert. However, it could be that something simple would work ok? Here is some code that stops printing <nav sections after it has seen the first one. You could adapt this to your desired nth parameter functionality.
    use strict; use warnings; my $nav_seen =0; while (<DATA>) { # if inside of <nav> section, print it # unless we have seen a <nav> section before if (my $status = /<nav/ ... /<\/nav/) { print unless $nav_seen; $nav_seen++ if $status =~ /E/; } else {print} } =PRINTOUT <body> <nav a=b> <div> </div> </nav> <div> </div> </body> =cut __DATA__ <body> <nav a=b> <div> </div> </nav> <div> </div> <nav c=d> <li> </li> </nav> </body>
    To understand how this works, I direct you to Flipin good, or a total flop?.
        We don't really have any idea of how general purpose that the OP's function needs to be.
        The OP's test input is very simple and doesn't demo anything complex.
        It would be appropriate for the OP to post an extended test case.
        I like your link+ and the discussion therein.
        I certainly don't propose my simple code to be anything other than perhaps a "hack" to deal with one particular webpage.