Man oh Man! How many times will I have to explain that if you want to do XML processing you have to use XML tools... NOT REGULAR EXPRESSIONS!
And if you want to do XML processing you have to start by learning what XML is.
Specifically:
Overall it's not like this does a terribly bad job at pseudo-parsing XML, just that why bother writing your own broken code when you could re-use existing, correct code. Especially in this case where a pretty simple SAX filter (or XML::Twig script of course, see below ;--) would work.
try perl tgrep -attr show elt tgrep.xml on this (well-formed) XML file:
<?xml version="1.0"?> <doc> <!-- <elt show="NOK1"> --> <!-- comment + --> <elt>text</elt> <!-- should not be fo +und --> <elt show="ok1"/> <!-- regular + --> <elt><![CDATA[<elt show="NOK2">text]]></elt> <!-- CDATA section + --> <elt show = "ok2" /> <!-- spaces around = + --> <elt show = "ok3" /> <!-- 2 spaces before +att --> <elt show='ok4' /> <!-- use ' instead of + " --> <elt SHOW='NOK3' /> <!-- upper case attri +bute name --> <ELT show='NOK4' /> <!-- upper case eleme +nt name --> <elt att=" show NOK5"/> <!-- use attribute na +me in value --> <elt odd=">" att="ok5"/> <!-- use > in attribu +te value --> </doc>
So here is a very simple XML::Twig script that would just do the same as , except it works on the test file:
#!/usr/bin/perl -w use strict; use XML::Twig; my $tag= shift @ARGV; my $att= shift @ARGV; # without the sprintf the expression looks real ugly # because of the interferences between XPath and Perl # syntaxes: "$tag\[\@$att]" my $path= sprintf( "%s[@%s]", $tag, $att); my $t= XML::Twig->new( start_tag_handlers => { $path => sub { print $_ +[0]->original_string, "\n"; }}); $t->parsefile( shift @ARGV);
In reply to Re: tgrep - A grep for XML/HTML tags
by mirod
in thread tgrep - A grep for HTML tags
by adrianh
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |