Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm after a regex that matches a file extension.
When I use a regex like this,
my $file_pattern ="*.txt"; $file =~ /$file_pattern/;
I match files that not only look like
test.txt
but also file that have txt in the middle of them
bigtxtfile.html
Any ideas?

Replies are listed 'Best First'.
Re: simple regex match question
by ikegami (Patriarch) on Nov 13, 2006 at 07:41 UTC

    People have already explained that regexp are different than shell matching.

    If you're trying to get a list of files matching a file spec, use glob. Otherwise, you'll have to use a regexp.

    If the pattern can be a regexp, the following will do fine:

    if ($file =~ /\.txt\z/) { print("Match\n"); } else { print("No match\n"); }

    If you need must be a file spec, the following will convert (rudimentary, as in just * and ?) file specs into a regexp:

    my $file_pattern = '.*'; my $re_pattern = ''; for ($file_pattern) { /\G \* /gcx && do { $re_pattern .= '.*'; redo }; /\G \? /gcx && do { $re_pattern .= '.'; redo }; /\G ([^*?]+) /gcx && do { $re_pattern .= quotemeta("$1"); redo }; } my $re = qr/^$re_pattern\z/; if ($file =~ $re) { print("Match\n"); } else { print("No match\n"); }

    Don't forget that regexps are case-sensitive by default. Use /\.txt\z/i (first snippet) or qr/^$re_pattern\z/i (second snippet) to make the match case-insensitive.

Re: simple regex match question
by davido (Cardinal) on Nov 13, 2006 at 06:00 UTC

    * and . have special meaning within regexes. If the string you're passing as a pattern into your regular expression contains an asterisk, that is going to be interpreted as a quantifier. And '.' is interpreted to mean "any character except newline".

    You could probably do it like this:

    if ( $file =~ m/\Q$file_pattern\E$/ ) { # .......

    Also note the '$' which anchors the match to the righthand side of the filename. Otherwise, you might find yourself matching "filename.txt.xml" on accident.


    Dave

      That doesn't seem to work...well, it works, but returns no files. Here is my test code:
      #!/usr/bin/perl use warnings; use Cwd; use File::Find; my $file_pattern ="*.txt"; find(\&d, cwd); sub d { my $file = $File::Find::name; return unless -f $file; return unless $file =~ /\Q$file_pattern\E$/; print "text file:$file\n"; }

        I didn't know what you were doing, but now I get it.

        The * character is not a wildcard in Perl. That's a shell wildcard. You cannot expect a regular expression to recognize shell-style wildcards; they're different "languages".


        Dave

Re: simple regex match question
by QM (Parson) on Nov 13, 2006 at 06:06 UTC
    I would add that you seem to be confusing filename globs (wildcards) with regex wildcards. Superficially they look similar, but are different. See glob and perlre.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: simple regex match question
by madbombX (Hermit) on Nov 13, 2006 at 06:07 UTC
    Also, should you decide to use the quantifiers in the actual regex rather than through a variable being interpolated, don't forget to escape the quantifiers:

    my $file =~ /\.txt$/;

    This regex ensures that the txt falls at the end of the filename and is preceded by a '.'.

Re: simple regex match question
by sanPerl (Friar) on Nov 13, 2006 at 09:51 UTC
    You can use File::Basename
    use File::Basename; ($name,$path,$suffix) = fileparse($fullname,@suffixlist);