Re: Replace empty alt tag on <img> tag
by haukex (Archbishop) on Jan 20, 2022 at 09:21 UTC
|
<image href = "images/cron1.png"><<alt>></alt></image>
More than a single example would be better. Note that <image> is "... an ancient and poorly supported precursor to the <img> element. It should not be used." whereas <img> is an empty element, meaning it can't have content. <img hrefsrc="images/cron1.png" alt="" /> would be a valid HTML example.
But assuming your example is accurate, i.e. it is not actually HTML (and ignoring the likely erroneous <<alt>>), you can use Mojo::DOM in XML mode to parse your data. I've made a guess that by "empty" you mean that it does not contain other tags or non-whitespace text, but please clarify this as well.
use warnings;
use strict;
use Mojo::DOM;
my $html = <<'END_HTML';
<image href = "images/cron1.png"></image>
<image href = "images/cron2.png"> </image>
<image href = "images/cron3.png">abc</image>
<image href = "images/cron4.png"><alt></alt></image>
<image href = "images/cron5.png"><alt><br/></alt></image>
<image href = "images/cron6.png"><i><alt> </alt></i></image>
<image href = "images/cron7.png"><alt>xyz</alt></image>
<image href = "images/cron8.png"><alt><b>xyz</b>ijk</alt></image>
<image href = "images/cron9.png">abc <alt><!-- foo --></alt> def</imag
+e>
END_HTML
my $dom = Mojo::DOM->new->xml(1)->parse($html);
$dom->find('image alt')->grep(sub { # find all <alt> tags inside <imag
+e>
not $_->child_nodes->grep(sub { # check whether they have content
$_->type eq 'text' || $_->type eq 'cdata'
? $_->content=~/\S/ : $_->type ne 'comment'
})->size;
})->map('remove');
print $dom;
__END__
<image href="images/cron1.png" />
<image href="images/cron2.png"> </image>
<image href="images/cron3.png">abc</image>
<image href="images/cron4.png" />
<image href="images/cron5.png"><alt><br /></alt></image>
<image href="images/cron6.png"><i /></image>
<image href="images/cron7.png"><alt>xyz</alt></image>
<image href="images/cron8.png"><alt><b>xyz</b>ijk</alt></image>
<image href="images/cron9.png">abc def</image>
Minor edits; changed selector from 'image > alt' to 'image alt' to find all descendants. Also: Comments are now not considered as content. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Replace empty alt tag on <img> tag
by choroba (Cardinal) on Jan 20, 2022 at 08:56 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $input = '<image href = "images/cron1.png"><alt></alt></image>';
my $dom = 'XML::LibXML'->load_xml(string => $input);
for my $alt ($dom->findnodes('//image/alt[not(text())]')) {
$alt->parentNode->removeChild($alt);
}
print $dom;
or the less verbose wrapper XML::XSH2:
open file.xml ;
rm //image/alt[not(text())] ;
save :b ;
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
|
Generally, no.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [Watch: Dir/Any] [d/l] |
|
|
|
|
| [reply] [Watch: Dir/Any] [d/l] |
|
|
Re: Replace empty alt tag on <img> tag (updated)
by haukex (Archbishop) on Jan 21, 2022 at 06:22 UTC
|
Please see It is uncool to update a node in a way that renders replies confusing or meaningless and mark your updates as such. It's the part starting with:
Here is my tried implementation. what changes do I need to make to get the desired output?
Your example is not complete (SSCCE) because it does not compile, does not show any input data, or most importantly which XML/HTML module you are using. My best guess at the moment based on the method names is XML::DOM. If that is the case, then the minimum change needed to your code to get it to work* is to change if($kids->item($k)->toString() =~ /<alt>/i) to if ( $kids->item($k)->getNodeType==ELEMENT_NODE && $kids->item($k)->getTagName eq 'alt' ).
* Update: "work" in this case meaning, to actually have an effect on the single example you provided. This doesn't address the question of only removing empty tags, but your approach of iterating over the list of children would work there as well, inspecting each one to see whether you consider it "content" or not, similar to what I do in my Mojo example.
Update 2: You edited the root node again. As I said above, don't do that, post a new question.
| [reply] [Watch: Dir/Any] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |