Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Replace empty alt tag on <img> tag

by vlearner (Initiate)
on Jan 20, 2022 at 07:52 UTC ( #11140628=perlquestion: print w/replies, xml ) Need Help??

vlearner has asked for the wisdom of the Perl Monks concerning the following question:

sorry for completely updating my question again, as the code mentioned below is removing all the <alt> tags in the .dita files which I have not attached here, so I need to provide the functionality of whether to remove the <alt> tag from the script or not to remove it. kindly guide me how should I update the below code with the required functionality.

sub RemoveAltTag($) { my $doc = shift; ############################## my $cnt = 0; my $nodes = $doc->getElementsByTagName("image"); for(my $i = 0;$i < $nodes->getLength(); $i++) { my $kids = $nodes->item($i)->getChildNodes(); for(my $k =0; $k < $kids->getLength(); $k++) { if($kids->item($k)->toString() =~ /<alt>/i) { $nodes->item($i)->removeChild($kids->item($k)); print "\n Removed <alt> tag" if $VERBOSE; $cnt++ } } } return $cnt;

Replies are listed 'Best First'.
Re: Replace empty alt tag on <img> tag
by haukex (Bishop) on Jan 20, 2022 at 09:21 UTC
    <image href = "images/cron1.png"><<alt>></alt></image>

    More than a single example would be better. Note that <image> is "... an ancient and poorly supported precursor to the <img> element. It should not be used." whereas <img> is an empty element, meaning it can't have content. <img hrefsrc="images/cron1.png" alt="" /> would be a valid HTML example.

    But assuming your example is accurate, i.e. it is not actually HTML (and ignoring the likely erroneous <<alt>>), you can use Mojo::DOM in XML mode to parse your data. I've made a guess that by "empty" you mean that it does not contain other tags or non-whitespace text, but please clarify this as well.

    use warnings; use strict; use Mojo::DOM; my $html = <<'END_HTML'; <image href = "images/cron1.png"></image> <image href = "images/cron2.png"> </image> <image href = "images/cron3.png">abc</image> <image href = "images/cron4.png"><alt></alt></image> <image href = "images/cron5.png"><alt><br/></alt></image> <image href = "images/cron6.png"><i><alt> </alt></i></image> <image href = "images/cron7.png"><alt>xyz</alt></image> <image href = "images/cron8.png"><alt><b>xyz</b>ijk</alt></image> <image href = "images/cron9.png">abc <alt><!-- foo --></alt> def</imag +e> END_HTML my $dom = Mojo::DOM->new->xml(1)->parse($html); $dom->find('image alt')->grep(sub { # find all <alt> tags inside <imag +e> not $_->child_nodes->grep(sub { # check whether they have content $_->type eq 'text' || $_->type eq 'cdata' ? $_->content=~/\S/ : $_->type ne 'comment' })->size; })->map('remove'); print $dom; __END__ <image href="images/cron1.png" /> <image href="images/cron2.png"> </image> <image href="images/cron3.png">abc</image> <image href="images/cron4.png" /> <image href="images/cron5.png"><alt><br /></alt></image> <image href="images/cron6.png"><i /></image> <image href="images/cron7.png"><alt>xyz</alt></image> <image href="images/cron8.png"><alt><b>xyz</b>ijk</alt></image> <image href="images/cron9.png">abc def</image>

    Minor edits; changed selector from 'image > alt' to 'image alt' to find all descendants. Also: Comments are now not considered as content.

Re: Replace empty alt tag on <img> tag
by choroba (Archbishop) on Jan 20, 2022 at 08:56 UTC
    You can use XML::LibXML:
    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $input = '<image href = "images/cron1.png"><alt></alt></image>'; my $dom = 'XML::LibXML'->load_xml(string => $input); for my $alt ($dom->findnodes('//image/alt[not(text())]')) { $alt->parentNode->removeChild($alt); } print $dom;

    or the less verbose wrapper XML::XSH2:

    open file.xml ; rm //image/alt[not(text())] ; save :b ;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      can I do the same with regular expression??

        Generally, no.
        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Replace empty alt tag on <img> tag (updated)
by haukex (Bishop) on Jan 21, 2022 at 06:22 UTC

    Please see It is uncool to update a node in a way that renders replies confusing or meaningless and mark your updates as such. It's the part starting with:

    Here is my tried implementation. what changes do I need to make to get the desired output?

    Your example is not complete (SSCCE) because it does not compile, does not show any input data, or most importantly which XML/HTML module you are using. My best guess at the moment based on the method names is XML::DOM. If that is the case, then the minimum change needed to your code to get it to work* is to change if($kids->item($k)->toString() =~ /<alt>/i) to if ( $kids->item($k)->getNodeType==ELEMENT_NODE && $kids->item($k)->getTagName eq 'alt' ).

    * Update: "work" in this case meaning, to actually have an effect on the single example you provided. This doesn't address the question of only removing empty tags, but your approach of iterating over the list of children would work there as well, inspecting each one to see whether you consider it "content" or not, similar to what I do in my Mojo example.

    Update 2: You edited the root node again. As I said above, don't do that, post a new question.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11140628]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2022-05-27 21:22 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (98 votes). Check out past polls.