comment on

I, too, very much like JavaFan's approach of Re: Using a regex to extract a version from a Un*x path. However, since I've already composed this reply, you might as well see it.

To avoid the confusion introduced by the presence of 'V2' in some paths, I depend on the presence of the magical 'V2DepCheck' sub-string. (JavaFan neatly avoids this issue by parsing right-to-left, but the regex approach I use must parse left-to-right.) Many more regexes are defined than in other approaches, but I find that it sometimes pays to be painfully explicit when the problem set is ill-defined and mutable, and maintenance may be an issue.

Code:

use warnings;
use strict;

my ($dotted, $digits, $v_num) = do {
    # - defining regex components and regexes in a do-block
    #   avoids propagation of a bunch of extraneous lexicals.
    # - NONE OF THESE REGEXES MAY HAVE CAPTURE GROUPS.
    #   capture groups in any of the 'private' regexes defined in
    #   this do-block will tend to confuse capture group counting
    #   in the regex in which they are ultimately used.
    #   (assumes perl 5.8.  regex enhancements of 5.10+ ease this
    #   restriction considerably.)
    # - except as noted, NONE OF THESE REGEXES MAY HAVE ELEMENTS
    #   THAT ARE CAPTURED, and should all be or be used within
    #   zero-width look-around assertions.
    my $pathsep               = qr{                [/\\]  }xms;
    my $v_tag                 = qr{           V2DepCheck  }xms;
    my $after_pathsep         = qr{ (?<=        $pathsep) }xms;
    my $before_pathsep        = qr{ (?=         $pathsep) }xms;
    my $before_eos_or_pathsep = qr{ (?=    \z | $pathsep) }xms;
    my $after_v_tag           = qr{ (?<= $v_tag $pathsep) }xms;
    my $no_v_tag              = qr{ (?!  $pathsep $v_tag) }xms;

    # validation assertions for various types of version numbers.
    my $ok_pre_dotted  = qr{ $after_pathsep         }xms;
    my $ok_post_dotted = qr{ $before_eos_or_pathsep }xms;
    my $ok_pre_digits  = qr{ $after_v_tag    }xms;
    my $ok_post_digits = qr{ $before_pathsep }xms;
    my $ok_pre_v_num   = qr{ (?<= $pathsep [Vv])              }xms;
    my $ok_post_v_num  = qr{ $no_v_tag $before_eos_or_pathsep }xms;

    # any of the regexes that follow may have captured elements.
    my $digits = qr{ \d+ }xms;
    my $dotted = qr{ $digits (?: \. $digits)+ }xms;

    # define regexes returned by do-block.
    qr{ $ok_pre_dotted  $dotted  $ok_post_dotted }xms,  # $dotted
    qr{ $ok_pre_digits  $digits  $ok_post_digits }xms,  # $digits
    qr{ $ok_pre_v_num   $digits  $ok_post_v_num  }xms;  # $v_num
    };

while (<DATA>) {
    chomp;
    my $ver    = '?????';
    my $indent = '';
    if (m{ ($dotted | $digits | $v_num) }xms) {
        $ver    = "'$1'";
        $indent = ' ' x $-[1];
        }
    print "str: '$_' \n";
    print "ver: $indent$ver \n";
    }

__DATA__
/tool/a/r/V2/V2DepCheck/1.109.2.1/V2DepCheck.pm
/tool/a/r/p4/r/main/V2/V2DepCheck/169441/V2DepCheck.pm
/tool/a/r/p4/r/branches/bd32b/V2/V2DepCheck/175507/V2DepCheck.pm
/home/me/cvs/V2/V2DepCheck.pm
/tool/a/r/boost/1.36.0
/tool/a/r/cadence/itk/itkvd/v007
[download]

Output:

>perl extract_ver_1.pl
str: '/tool/a/r/V2/V2DepCheck/1.109.2.1/V2DepCheck.pm'
ver:                         '1.109.2.1'
str: '/tool/a/r/p4/r/main/V2/V2DepCheck/169441/V2DepCheck.pm'
ver:                                   '169441'
str: '/tool/a/r/p4/r/branches/bd32b/V2/V2DepCheck/175507/V2DepCheck.pm
+'
ver:                                             '175507'
str: '/home/me/cvs/V2/V2DepCheck.pm'
ver: ?????
str: '/tool/a/r/boost/1.36.0'
ver:                 '1.36.0'
str: '/tool/a/r/cadence/itk/itkvd/v007'
ver:                              '007'
[download]

In reply to Re^3: Using a regex to extract a version from a Un*x path by AnomalousMonk
in thread Using a regex to extract a version from a Un*x path by gcmandrake

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.