thirtySeven has asked for the wisdom of the Perl Monks concerning the following question:

I have a bunch of file paths that look like $path. I want to get the yyyy_mm_dd_hh_mm out of the end of the path. I currently have a working solution ... but I want to know if it is possible to combine the entire thing into one expression if only for learning purposes. I am wondering if it can't work because of the way I am mixing in a list context.
# This works my $path = '/.snapshots123/yabsm/root/hourly/day=2021_03_04,time=21:20 +'; my $end_of_path = $1 if $path =~ m/([^\/]+$)/; my @nums = $end_of_path =~ m/([0-9]{2,4})/g;
I have tried to quite literally combine them but it did not work:
# This does not work my @nums = m/([0-9]+)/g =~ $1 if $path =~ m/([^\/]+$)/;
I would appreciate insight.

Replies are listed 'Best First'.
Re: How can I combine these two regular expressions?
by eyepopslikeamosquito (Archbishop) on Mar 05, 2021 at 03:49 UTC

    Your requirements are not crystal clear to me. I see that your original program (turned into a Short, Self-Contained, Correct Example):

    use strict; use warnings; use Data::Dumper; my $path = '/.snapshots123/yabsm/root/hourly/day=2021_03_04,time=21:20 +'; my $end_of_path; $end_of_path = $1 if $path =~ m/([^\/]+$)/; my @nums = $end_of_path =~ m/([0-9]{2,4})/g; print Dumper( \@nums );
    prints:
    $VAR1 = [ '2021', '03', '04', '21', '20' ];

    ... as does my simpler version:

    use strict; use warnings; use Data::Dumper; my $path = '/.snapshots123/yabsm/root/hourly/day=2021_03_04,time=21:20 +'; my @nums = $path =~ m{/day=(\d\d\d\d)_(\d\d)_(\d\d),time=(\d\d):(\d\d) +$}; print Dumper( \@nums );
    Does my version satisfy your requirements?

      Thanks this is great because its so simple.
Re: How can I combine these two regular expressions?
by tybalt89 (Monsignor) on Mar 05, 2021 at 05:49 UTC
    #!/usr/bin/perl use strict; use warnings; my $path = '/.snapshots123/yabsm/root/hourly/day=2021_03_04,time=21:20 +'; my @nums = $path =~ m/([0-9]{2,4})(?!.*\/)/g; print "@nums\n";

    Outputs:

    2021 03 04 21 20
Re: How can I combine these two regular expressions?
by kcott (Archbishop) on Mar 05, 2021 at 06:09 UTC

    G'day thirtySeven,

    Assuming that's a truly representative $path, you can use this:

    $path =~ m/((?<=[=_:])\d+)/g

    If you're unfamiliar with (?<=pattern), see "perlre: zero-width positive lookbehind assertion".

    Here's an example with your assumed, representative $path:

    $ perl -E ' my $path = q{/.snapshots123/yabsm/root/hourly/day=2021_03_04,time= +21:20}; say for $path =~ m/((?<=[=_:])\d+)/g; ' 2021 03 04 21 20

    Here's an example where that fails, using a $path that's almost the same as the assumed, representative $path:

    $ perl -E ' my $path = q{/.snapshots_23/yabsm/root/hourly/day=2021_03_04,time= +21:20}; say for $path =~ m/((?<=[=_:])\d+)/g; ' 23 2021 03 04 21 20
    "only for learning purposes"

    Understood. In a real-world application, I'd probably lose the extra text (e.g. day=) and use a simpler format, e.g. YYYY_MM_DD_hh_mm or even just YYYYMMDDhhmm.

    — Ken

Re: How can I combine these two regular expressions?
by Fletch (Bishop) on Mar 05, 2021 at 14:59 UTC

    If you need the filename separately considering the domain you might use Path::Tiny (or File::Basename) to split things up before you then apply your regex. It's semantically clearer what you're trying to do as well as isolating you from platform differences in directory separators.

    my $filename = path( $path )->basename; my( $date, $time ) = $filename =~ m{## as others have shown}x;

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: How can I combine these two regular expressions?
by jcb (Parson) on Mar 05, 2021 at 03:41 UTC

    The straightforward solution is to actually combine the regular expressions and eliminate the $end_of_path variable:

    my $path = '/.snapshots123/yabsm/root/hourly/day=2021_03_04,time=21:20 +'; my @nums = $path =~ m|/day=(\d{4})_(\d{2})_(\d{2}),time=(\d{2}):(\d{2} +)|;

    If you do need the $end_of_path variable, there is a more interesting trick:

    my $path = '/.snapshots123/yabsm/root/hourly/day=2021_03_04,time=21:20 +'; my ($end_of_path, @nums) = $path =~ m|/(day=(\d{4})_(\d{2})_(\d{2}),time=(\d{2}):(\d{2}))|;

    In both of these, I have also used the \d regex char-class escape (see perlre for more) instead of writing out [0-9] repeatedly. The second example uses Perl's list assignment syntax to "peel off" the first value from the returned list, which will contain the contents of each capturing group, in the order in which their left parentheses appear in the expression.

    Lastly, for my own testing, I added a few lines to produce some output:

    use Data::Dump; dd $end_of_path; dd @nums;

    Quick update: The reason that your my @nums = m/([0-9]+)/g =~ $1 if $path =~ m/([^\/]+$)/; does not work is that you have the operands to the =~ operator backwards. Try my @nums = $1 =~ m/([0-9]+)/g if $path =~ m/([^\/]+$)/; instead; the =~ operator is always VALUE =~ PATTERN.