Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am writing a script that can take the search pattern as an argument and list all the files that match that pattern in a specified directory.

The below script works fine as long as I provide the search pattern as words or numbers.

use strict; my $wildcard = $ARGV[0]; print "wildcard is $wildcard\n"; my $filespath = '/path/to/files/'; opendir DIR, $filespath or die "Couldn't open $filespath: $!\n"; # Sort by asc order (time) # (The commented line below sorts all the files that end with ".cgi" #my @dirfiles = sort { -M $filespath.$a <=> -M $filespath.$b } grep /\ +.cgi$/, readdir DIR; my @dirfiles = sort { -M $filespath.$a <=> -M $filespath.$b } grep /$w +ildcard/, readdir DIR; foreach (@dirfiles){ print "$filespath$_\n" }

How do I make it work for patterns such as "a*sample*.cgi" (the string to match is within the quotes)?

Thanks, Vgn

Replies are listed 'Best First'.
Re: Wildcard search in a directory
by graff (Chancellor) on Jul 05, 2007 at 18:13 UTC
    "File glob" syntax is way different from regex syntax. If you want your script to handle file-glob patterns, use "glob()" as suggested in the first reply. If you want it to support the greater expressive power of regexes, then make it clear to script users that they should specify regexes instead of file glob patterns.

    A few of the more prominent differences (assuming unix/linux file systems):

    magic characterregex meaningglob meaning
    . match any single character match a literal period
    ? match zero or one occurrence of the previous character or group match any single character
    * match zero or more occurrences of the previous character or group match zero or more characters
    + match one or more occurrences of the previous character or group match a literal plus-sign
    ^ $ anchor regex to start/end of string match literal caret/dollar-sign
    ( ) capture a group of characters not supported

    Square-bracketed character-classes (e.g. [a-z] work the same in both, but a pattern like  [a-z]* will mean very different things for regex vs. glob.

    It would be possible (probably easy and maybe even useful) to write a "glob-to-regex" converter, to produce a regex pattern that behaves the same as a given glob pattern, but if you're just looking up file names in a directory, you already have the glob() function to handle that, so why bother with the extra coding?

      Thanks for the responses.

      I think glob() would suit my requirement better than regex. Will update after some more testing

Re: Wildcard search in a directory
by Tux (Canon) on Jul 05, 2007 at 17:26 UTC
    my @dirfiles = glob "$filespath/$wildcard.cgi";

    Note: not taint-safe.


    Enjoy, Have FUN! H.Merijn
Re: Wildcard search in a directory
by citromatik (Curate) on Jul 06, 2007 at 09:03 UTC
    How do I make it work for patterns such as "a*sample*.cgi" (the string to match is within the quotes)?

    First of all, are you sure that the regular expression you give in your example is what you want to get? (0 or more "a"s followed by "sampl" and 0 or more "e"s, then, any character followed by "cgi"

    Although I agree with the previous suggestions of using the glob function instead of your own grep, your script should work as well.

    I guess that your problem is due to a premature expansion of your regular expression: if you don't enclose the regexp in quotes it will be expanded by the shell before passing the arguments to the script:

    Suppose that you have a directory like this:

    $ ls -1 /tmp/test/ other.txt test10.txt test1.txt test2.txt test3.txt

    If you name your script "find_files.pl" and you call it with:

    $ perl find_files.pl test*.txt

    then, the shell will try to expand test*.txt and will pass the result of this expansion to the script. If you don't want this, you must put single quotes around it and provide the script with a valid perl regular expression like in:

    $ perl find_files.pl 'test.*\.txt' wildcard is test.*\.txt /tmp/test/test10.txt /tmp/test/test3.txt /tmp/test/test2.txt /tmp/test/test1.txt

    citromatik