AFAIK, a tag name may not start with the string "xml". So I doubt your document really qualifies as being an "XML file".

Anyway, matching the version attribute for a specifically controlrolled xml-ish file, where the tags are in a fixed order, can be as simple as

my($version, $int) = /<xml\sversion="((\d+)\.?\d*)"/;
where $version gets the value "24.0" and $int the value "24". (Note that I'm matching against the text in $_, not in $text, because the code can be somewhat simpler this way — it would just drag the attention away from the important part in the code: the regex.) This can be reduced if you don't need one of those two, thus
my($version) = /<xml\sversion="(\d+\.?\d*)"/;
for the floating point and
my($version) = /<xml\sversion="(\d+)\.?\d*"/;
for the integer representation.

p.s. If the tag layout isn't as fixed, thus when attributes can move around, there are somewhat more complex ways to do it with regular expressions too, but I'll come back to that later when I have some more time to test it. Watch this space for updates.

update As promised, here's a more complex regular expression which can match various variations on this string, complete with some test cases.

#!/usr/bin/perl -w foreach ( '<xml version= "24.0" IP="1.1.2.3" baseVersion="beta_3" lastUpdate=" +22-Apr-06" >', '<xml IP="1.1.2.3" baseVersion="beta_3" version= "24.0" lastUpdate=" +22-Apr-06" >', q<<xml IP='1.1.2.3' baseVersion="beta_3" lastUpdate="22-Apr-06" vers +ion= '24.0'>>, ) { if(/<xml (?> \s+ [a-zA-Z][^\s\/=>'"]* \s* = \s* (?: " [^"]* " | ' [^']* +' ) )*? \s+ version \s* = \s* (?:"([^"]*)"|'([^']*)') /x) { print "Match '$+' in $_\n"; } else { print "No match in $_\n"; } }
Result:
Match '24.0' in <xml version= "24.0" IP="1.1.2.3" baseVersion="beta_3" + lastUpdate="22-Apr-06" > Match '24.0' in <xml IP="1.1.2.3" baseVersion="beta_3" version= "24.0" + lastUpdate="22-Apr-06" > Match '24.0' in <xml IP='1.1.2.3' baseVersion="beta_3" lastUpdate="22- +Apr-06" version= '24.0'>

I'm trying to match as many "attribute="value"" items as I can (single quotes are allowed too), preceded by whitespace, but not matching a "version" attribute yet, using nongreedy matching (PATTERN*?). I'm quite liberal in what I accept in an attribute name, I just exclude some obviously unacceptable characters. When finally matching the version attribute, again I'm accepting either single or double quotes, and I'm using $+ to select the subpattern that actually matched.

And in a regex of this complexity, use of /x is strongly advised, which results in whitespace (when not preceded by a backslash) being ignored, so I can show the subpatterns in logical groups.

Finally, I'm using the cut operator ((?>pattern)), which has two effects: (1) I can group without capturing, just as with (?:pattern), and (2), it'll prevent useless backtracking, which could always happen when you stack repetition quantifiers. You never know, and it doesn't hurt.


In reply to Re: Extract version attribute value from xml header line by bart
in thread Extract version attribute value from xml header line by just dave

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.