I guess I'm interested to know what the general consensus for the use of .* is. Is it something to be avoided at all costs, or is it a powerful, oft-misused tool that can be useful and beneficial in carefully controlled circumstances?
My rule of thumb for .* versus .*? is that the former is for grabbing everything after a certain point (I can't be bothered with $') and the latter for grabbing data between 2 points. So I guess it's a 'powerful oft-misused tool', but that's more due to the fact that people aren't aware of the concept of quantifier greediness.
While I'm at it *grin*, does anyone have a "better idea" for pulling the data out of the tags?
Due to the nature of XML it might be a good idea to have more layered regexes e.g
## *very* simplistic stuff (e.g doesn't deal with nested tags) my $token = qr{ (?: \b [A-Z]\w+ \b ) }xi; my $attrib = qr{ (?: $token \s* = \s* "[^"]+" \s* ) }x; my $begin_tag = qr{ < ( $token ) \s* ( $attrib* ) > }x; my $end_tag = qr{ </$token> }x; my $example = q[<ClientID type="String">A1234BX</ClientID>]; my($tag, $attribs, $data) = $example =~ m{ $begin_tag (.*?) $end_tag }x; print "tag - $tag\n"; print "attribs - $attribs\n"; print "data - $data\n"; __output__ tag - ClientID attribs - type="String" data - A1234BX
That could be simplified into a single regex, but like most things complex, they're much easier to digest if they're broken down into smaller components.
HTH

_________
broquaint


In reply to Re: An "ethical" use of dot-star ..? by broquaint
in thread An "ethical" use of dot-star ..? by Tanalis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.