ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

Before trying to explain my problem, I would like to say that I'm not sure its possible to achieve. I'm a bit confused about the whole idea and wanted to hear your thoughts.
Consider the following paths:
/ax/disks/xyz.prdenv.1/tool_utils/asda/15.2.0.5/asda.csh /tmp_log/site/disks/sad/tool_utr/zensxi/12.01.001a/log/script.pl
I'm trying to build a script which iterates over a disk which is giving as an input and to try to guess which regex I need for catching the group and the version.
For the first path, I can understand that the group is "asda" and the version is "15.2.0.5" and for the second path I can understand that the group is "zensxi" and the version is 12.01.001a.
So the regex for the first path, could be: /ax/disks/xyz.prdenv.1/tool_utils/(.*?)/(.*?)/.*
The regex for the second path, could be: /tmp_log/site/disks/sad/tool_utr(.*?)/(.*?)/.*
Is it possible to achieve? that main problem here, is that the version does not have to be all digits.
The reason I need those regexes is for creating an example file which user can give our tool and it will use those regex to filter the wanted paths.
The script I'm trying to build, will suggest all possible combination of regexes so it the user will just have to look into them and choose the best one.

Replies are listed 'Best First'.
Re: Regex creation
by Corion (Patriarch) on Aug 29, 2019 at 14:52 UTC

    What makes a "version"? Is it simply something that sits between two slashes and starts with a number?

    Update: Also, is the list of your groups limited? Then use maybe something like /(asda|zensxi)/. Otherwise, how do you recognize the group? Is it the thing that comes before the version?

Re: Regex creation
by Fletch (Bishop) on Aug 29, 2019 at 14:52 UTC

    Handwaving vague idea, but . . . . First thing come up with a regex for what you consider a "version number"; e.g. qr{^( \d+ (?:\.\d+)? \w+* )$}x or whatever.

    • Since you're dealing with paths, go ahead and split on the path separator
    • Walk the split path components looking for the index that matches your version regex
    • Create your candidate by rejoining everything up to just before that index back with the path sep, then your qr{([^/]+) / ([^/]+) / .*}x

    Update: Fish!

    #!/usr/bin/env perl use 5.018; my $version_number_re = qr{^( \d+ (?:\. \d+)* (?:\w+)? )$}x; my $target_component_re = qr{([^/]+?)}x; my $rest_re = qr{.*}; while( <DATA> ) { chomp; my @path_components = split( qr{/}, $_ ); my $version_idx; FIND_VERSION: for my $idx ( 0 .. $#path_components ) { if( $path_components[ $idx ] =~ $version_number_re ) { $version_idx = $idx; last FIND_VERSION; } } if( $version_idx ) { my $candidate_re = join( q{/}, ( $version_idx-2 >= 0 ? @path_components[0..$version_idx-2 +] : (q{NO ROOM}) ), ($target_component_re) x 2, ($version_idx < $#path_components ? $rest_re : ()) ); my( $group, $version ) = m{$candidate_re}; say qq{Original:\n$_\nCandidate regex:\n$candidate_re\n\tgroup: $g +roup\n\tversion: $version\n}; } else { say qq{No candidate; orig:\n$_\n} } } exit 0; __END__ /ax/disks/xyz.prdenv.1/tool_utils/asda/15.2.0.5/asda.csh /tmp_log/site/disks/sad/tool_utr/zensxi/12.01.001a/log/script.pl /foo/bar/867.5309jenny/blonk.el /I/reject/your/reality/and/substitute/my/own
    Original: /ax/disks/xyz.prdenv.1/tool_utils/asda/15.2.0.5/asda.csh Candidate regex: /ax/disks/xyz.prdenv.1/tool_utils/(?^ux:([^/]+?))/(?^ux:([^/]+?))/(?^u +:.*) group: asda version: 15.2.0.5 Original: /tmp_log/site/disks/sad/tool_utr/zensxi/12.01.001a/log/script.pl Candidate regex: /tmp_log/site/disks/sad/tool_utr/(?^ux:([^/]+?))/(?^ux:([^/]+?))/(?^u: +.*) group: zensxi version: 12.01.001a Original: /foo/bar/867.5309jenny/blonk.el Candidate regex: /foo/(?^ux:([^/]+?))/(?^ux:([^/]+?))/(?^u:.*) group: bar version: 867.5309jenny No candidate; orig: /I/reject/your/reality/and/substitute/my/own

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Regex creation
by jcb (Parson) on Aug 29, 2019 at 23:16 UTC

    Your example paths both put "groups" under a directory that starts with "tool_". If this pattern is actually present in your data, and we can assume POSIX, a single regex can extract all of them: m[/tool_[^/]*/([^/]+)/([^/]+)/] will put the "group" in $1 and the "version" in $2.