Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi There Monks!

Trying to match file names with similar words on them, like these:

File name format samples:
20220401_note.txt 20220303_page.txt 20220101_page_with_blanks.txt 20220111_page_blanks.txt
These are the words to match on all the file names I am trying to get: ... my @get = qw(note page page_with_blanks page_blanks); # I have a sub to process them as they get read from a directory, code + not complete but to show where the issue is: sub get_count { my $dir = 'data'; my $match = join '|',@get; # Doing a for loop here to match according to the values in $match now +: my @files = glob "$dir/*.txt"; for my $file (sort @files) { # $match only gets the file names with "page" and ignores "page_w +ith_blanks" and "page_blank". # How could I match all the file names and the ones has "page" o +nly, also the other variations? if ($file =~ /^\bdata\b\/($year\d{4})_($match)/g) { my $date = $1; my $name = $2; print "D:$date - N:$name\n"; # It would print: #D: 20220401 - N:note.txt #D: 20220303 - N:page.txt #D: 20220101 - N:page_with_blanks.txt #D: 20220111 - N:page_blanks.txt }}

I hope its an easy way to explain the issue!
Thanks for looking!

Replies are listed 'Best First'.
Re: Match similar words in file name
by haukex (Archbishop) on Apr 06, 2022 at 18:33 UTC
Re: Match similar words in file name
by choroba (Cardinal) on Apr 06, 2022 at 18:06 UTC
    The common trick is to sort the words by length descending. The longer matches will then be tried first.
    #! /usr/bin/perl use warnings; use strict; use feature qw{ say }; my @files = qw( 20220401_note.txt 20200101_no_match.txt 20220303_page.txt 20220101_page_with_blanks.txt 20220111_page_blanks.txt ); my @get = qw( note page page_with_blanks page_blanks ); my $match = join '|', sort { length $b <=> length $a } map quotemeta, @get; for my $file (@files) { if (my ($date, $name) = $file =~ /(202[0-9]{5})_($match)/) { say "D:$date - N:$name"; } }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]