in reply to Best way to match a file.

Three quick observations:

  1. You can make the match more efficient by anchoring it to the end of the string: m/(.{10,}XYZQW.*\.csv$)/i

  2. .{10,}.* says “match at least 10 characters, followed by 0 or more characters”. It is equivalent to .{10,} by itself, i.e., the additional .* is redundant.

  3. The /g modifier is also redundant here.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: Best way to match a file.
by Anonymous Monk on Jul 25, 2013 at 17:02 UTC
    That’s good, but if in a different condition could it match both file names and with different extensions in:
    print "\n REMOVED $file\n\n" if $file =~ m/(.{10,}XYZQW|KMHYT.*\.csv| +\.txt)/i;
    Thanks

      Yes, but you need to group the alternations, and for efficiency, the grouping should be non-capturing:

      #! perl use strict; use warnings; for ( 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv', 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt', 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat', 'ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv', 'ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv', ) { if (/ .{10,} (?: XYZQW | KMHYT) .* \. (?: csv | txt) $ /ix) { print "Matched $_\n"; } else { print "Ignoring $_\n"; } }

      Output:

      11:57 >perl 673_SoPW.pl Matched ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv Matched ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt Ignoring ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat Matched ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv Ignoring ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv 11:57 >

      On grouping, see Regular Expressions:

      WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression (?: ... ) instead.)

      Note that I’ve also used /x for improved readability.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,