Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I am trying to match a file name if it contains a string value in it. I have a test script but I am not sure if I am been efficient on the match.
use warnings; my $file = "ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv"; # my $file2 = "ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv"; # print "\n 6 - REMOVED $file\n\n" if $file =~ m/(.{10,}.*XYZQW.*\.csv)/ +ig; print "\n 8 - REMOVED $file\n\n" if $file =~ m/(.{10,}.*(XYZQW|KMHYT)\ +.\w{8}\.csv)/i; print "\n 9 - REMOVED $file2\n" if $file2 =~ m/(.{10,}.*KMHYT.*\.csv)/ +ig;

Thanks for looking!

Replies are listed 'Best First'.
Re: Best way to match a file.
by Athanasius (Archbishop) on Jul 25, 2013 at 16:07 UTC

    Three quick observations:

    1. You can make the match more efficient by anchoring it to the end of the string: m/(.{10,}XYZQW.*\.csv$)/i

    2. .{10,}.* says “match at least 10 characters, followed by 0 or more characters”. It is equivalent to .{10,} by itself, i.e., the additional .* is redundant.

    3. The /g modifier is also redundant here.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      That’s good, but if in a different condition could it match both file names and with different extensions in:
      print "\n REMOVED $file\n\n" if $file =~ m/(.{10,}XYZQW|KMHYT.*\.csv| +\.txt)/i;
      Thanks

        Yes, but you need to group the alternations, and for efficiency, the grouping should be non-capturing:

        #! perl use strict; use warnings; for ( 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv', 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt', 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat', 'ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv', 'ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv', ) { if (/ .{10,} (?: XYZQW | KMHYT) .* \. (?: csv | txt) $ /ix) { print "Matched $_\n"; } else { print "Ignoring $_\n"; } }

        Output:

        11:57 >perl 673_SoPW.pl Matched ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv Matched ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt Ignoring ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat Matched ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv Ignoring ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv 11:57 >

        On grouping, see Regular Expressions:

        WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression (?: ... ) instead.)

        Note that I’ve also used /x for improved readability.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Best way to match a file.
by mtmcc (Hermit) on Jul 25, 2013 at 16:02 UTC
    I presume from your code that it matters to you where in the filename the string occurs? Or do you just want to match the string, regardless of where it occurs in the filename?