Man oh Man! How many times will I have to explain that if you want to do XML processing you have to use XML tools... NOT REGULAR EXPRESSIONS!

And if you want to do XML processing you have to start by learning what XML is.

Specifically:

Overall it's not like this does a terribly bad job at pseudo-parsing XML, just that why bother writing your own broken code when you could re-use existing, correct code. Especially in this case where a pretty simple SAX filter (or XML::Twig script of course, see below ;--) would work.

try perl tgrep -attr show elt tgrep.xml on this (well-formed) XML file:

<?xml version="1.0"?> <doc> <!-- <elt show="NOK1"> --> <!-- comment + --> <elt>text</elt> <!-- should not be fo +und --> <elt show="ok1"/> <!-- regular + --> <elt><![CDATA[<elt show="NOK2">text]]></elt> <!-- CDATA section + --> <elt show = "ok2" /> <!-- spaces around = + --> <elt show = "ok3" /> <!-- 2 spaces before +att --> <elt show='ok4' /> <!-- use ' instead of + " --> <elt SHOW='NOK3' /> <!-- upper case attri +bute name --> <ELT show='NOK4' /> <!-- upper case eleme +nt name --> <elt att=" show NOK5"/> <!-- use attribute na +me in value --> <elt odd=">" att="ok5"/> <!-- use > in attribu +te value --> </doc>

So here is a very simple XML::Twig script that would just do the same as , except it works on the test file:

#!/usr/bin/perl -w use strict; use XML::Twig; my $tag= shift @ARGV; my $att= shift @ARGV; # without the sprintf the expression looks real ugly # because of the interferences between XPath and Perl # syntaxes: "$tag\[\@$att]" my $path= sprintf( "%s[@%s]", $tag, $att); my $t= XML::Twig->new( start_tag_handlers => { $path => sub { print $_ +[0]->original_string, "\n"; }}); $t->parsefile( shift @ARGV);

In reply to Re: tgrep - A grep for XML/HTML tags by mirod
in thread tgrep - A grep for HTML tags by adrianh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.