arunmep has asked for the wisdom of the Perl Monks concerning the following question:

hi guys,

I am using a regular expression to search for file extension i dont want .Dll files to pass through the loop but .Doc files to go through but i cant figure out dll is getting through if i use .doc files the loop is

if($filename=~/(\.[htm|html|txt|pdf|ppt|csv|doc]{3,4})\b/i) { }

please help me

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: perl regular expression
by GrandFather (Saint) on Oct 03, 2006 at 10:48 UTC

    Now that code tags have been added to your node so that the code is clear, the problem also is clear. You are using [] in the regex which means match any of the characters within the square brackets. The fix is almost easy - use a non-capture group ((?:...)):

    if ($filename =~ /(\.(?:htm|html|txt|pdf|ppt|csv|doc]))\b/i)

    Note that the repeat ({3,4}) is not needed.


    DWIM is Perl's answer to Gödel
      I know the OP had the word boundry match at the end, but I don't see its use here. Neither do I see the need for a non capturing group. Either way, the OP should probably positively match for the end of the file name, otherwise one might end up passing a file by virtue of a match part way through the filename.

      The following wont print match...

      my $filename = "/home/root/this_is_a_valid_name.html.gz"; if ($filename =~ m/(\.(htm|html|txt|pdf|ppt|csv|doc))$/i) { print "match\n"; # this won't print }
      ... however swap in the this one, and is passes just fine...
      my $filename = "/home/root/this_is_a_valid_name.html.gz"; if ($filename =~ /(\.(?:htm|html|txt|pdf|ppt|csv|doc]))\b/i) { print "match\n"; # this will print }
      ---
      my name's not Keith, and I'm not reasonable.

        What happens when the filename is (arbitrarily contrived, but): magical.html_parser.dll?

        Though I suppose it matters what the OP wants to do with the file(s), and whether or not a .gz'd html/txt/etc file is desired.



        --chargrill
        s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
Re: perl regular expression
by rminner (Chaplain) on Oct 03, 2006 at 07:08 UTC
    Hi,
    doc{3,4} matches doccc and docccc. Another thing is, that you should move the \. outside the parentheses.
    Update: I somehow missed the square brackets. It's just like Grandfather said(Re: perl regular expression). Next time i'll try to focus a bit more while looking at the regex ...
    Well, at least the example i gave was correct :).

    I'll give you an example programm:
    use strict; use warnings; my $dirname = "/home/hfob"; opendir my $DH , $dirname or die "failed to open $dirname ($!)\n"; my @files = grep { /\.(html?|txt|pdf|ppt|csv|doc)$/i} readdir $DH; print "$_\n" for @files; closedir $DH;
    If you also want to match files which have a format .txt.<some_other_extension>, you could adapt the grep like follows:
    my @files = grep { /\.(html?|txt|pdf|ppt|csv|doc)(\.|$)/i} readdir $DH +;
    Then it will accept end of string (in your case the filename), or a dot after the extension. An example for such a case could be myfile.txt.gz