Best way to match a file.

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Best way to match a file.
by Athanasius (Archbishop) on Jul 25, 2013 at 16:07 UTC

Three quick observations:

You can make the match more efficient by anchoring it to the end of the string: m/(.{10,}XYZQW.*\.csv$)/i
.{10,}.* says “match at least 10 characters, followed by 0 or more characters”. It is equivalent to .{10,} by itself, i.e., the additional .* is redundant.
The /g modifier is also redundant here.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: Best way to match a file.

by Anonymous Monk on Jul 25, 2013 at 17:02 UTC

print "\n  REMOVED $file\n\n" if $file =~ m/(.{10,}XYZQW|KMHYT.*\.csv|
+\.txt)/i;
[download]

[reply]
[d/l]

Re^3: Best way to match a file.

by Athanasius (Archbishop) on Jul 26, 2013 at 02:07 UTC

Yes, but you need to group the alternations, and for efficiency, the grouping should be non-capturing:

#! perl
use strict;
use warnings;

for (
       'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv',
       'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt',
       'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat',
       'ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv',
       'ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv',
    )
{
    if (/ .{10,} (?: XYZQW | KMHYT) .* \. (?: csv | txt) $ /ix)
    {
        print "Matched  $_\n";
    }
    else
    {
        print "Ignoring $_\n";
    }
}
[download]

Output:

11:57 >perl 673_SoPW.pl
Matched  ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv
Matched  ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt
Ignoring ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat
Matched  ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv
Ignoring ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv

11:57 >
[download]

On grouping, see Regular Expressions:

WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression (?: ... ) instead.)

Note that I’ve also used /x for improved readability.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: Best way to match a file.
by mtmcc (Hermit) on Jul 25, 2013 at 16:02 UTC

I presume from your code that it matters to you where in the filename the string occurs? Or do you just want to match the string, regardless of where it occurs in the filename?

[reply]