aeqr has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to print the files in the current directory that are not perl scripts. For this, I tried something like this:
sub list_files{ opendir(my $dir,".") or die "Could not open current directory"; while (my $file = readdir($dir)) { print "$file\n" if ($file=~m/(?!\.pl$)/); } closedir $dir; }
What I try here is "all files that don't match something with .pl at the end". But it still prints everything. So I have tried something like this:
sub list_files{ opendir(my $dir,".") or die "Could not open current directory"; while (my $file = readdir($dir)) { print "$file\n" if (!($file=~m/(?=\.pl$)/)); } closedir $dir; }
The second one does what I want. Both seem to be similar however they don't behave the same. Why?

Additional question: The scripts prints also "." and "..", why are they and can it be avoid? I want to make some changes in the file names that are in the current directory. But not on "." and ".."

Replies are listed 'Best First'.
Re: Regex negative lookahead
by NetWallah (Canon) on Apr 25, 2014 at 04:20 UTC
    The negative lookahead used in your first case m/(?!\.pl$)/) says:

    "The NEXT thing cannot be ".pl". When it sees text ending in ".pl", the thing AFTER the ".pl is NOT ".pl", it is the end of line. So, the match succeeds.

    The problem is that it is a ZERO WIDTH assertion, and does not require any text to match.

    I would write the expression as :

    print "$file\n" if $file !~/\.pl$/
    Which works fine.

    If you want to avoid the "." and "..", you need to check for those :

    next if $file =~/^\.\.?$|\.pl$/;

            What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                  -Larry Wall, 1992

      Thank you for your answer, I think I understand now. But about these "." and "..", are they files? Is "." the same "." as the current directory?
        Yes : "." is the current directory, and ".." is one level above.

        These are not files.

        It is traditional for them to be viewed in a directory listing.

                What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                      -Larry Wall, 1992

Re: Regex negative lookahead
by thargas (Deacon) on Apr 25, 2014 at 13:54 UTC

    If you want to make changes to files, you might want to check that the entry that readdir returned is in fact a file (see "-f"). This will have the side-effect of ignoring "." and ".." as those are directories. readdir will show all entries in the directory (files, directories, named-pipes, ...)

    Speaking of dot-files, in unix (and others), files whose names begin with a "." are considered to be "hidden" and, by default, most programs won't show them. Since readdir shows you everything (like you get from ls -a), you get to ignore them yourself in your code.

Re: Regex negative lookahead
by mr_mischief (Monsignor) on Apr 25, 2014 at 14:57 UTC

    You're providing a lookahead without anything to match before it. A negative match like provided by NetWallah is quite useful when you're wanting to make a decision based on what doesn't match. A lookahead is for when you do want to match something, based in part on what is or isn't next to it, without matching what's in the lookahead.

    As an example, let's say you did want to match the Perl files but not other files, but that you don't want to print the '.pl' part of the filename.

    sub list_perl_files_without_extension { opendir my $dir, '.' or die "Could not open current directory"; while ( my $file = readdir $dir ) { print "$&\n" if $file =~ m/.*(?=\.pl)/; } closedir $dir; }
Re: Regex negative lookahead
by AnomalousMonk (Archbishop) on Apr 25, 2014 at 17:23 UTC

    When you match a regex against a string, you're essentially asking the question "is there any position in the string at which the pattern matches?" The regex engine (RE) is looking for any match, and normally will be satisfied by the first match (it's possible to force the RE to go on looking). The rule is "the leftmost, longest match", and "leftmost" trumps "longest."

    c:\@Work\Perl\monks>perl -wMstrict -le "for my $f ('xyz.pl', '.pl', 'abc.x', 'abc') { printf qq{%10s matches at position $-[0] \n}, qq{'$f'} if $f =~ m{ (?! [.]pl \z) }xms; } " 'xyz.pl' matches at position 0 '.pl' matches at position 1 'abc.x' matches at position 0 'abc' matches at position 0

    In most of these examples, a match can be found at the very start of the string, the first position "not followed by .pl and the end of the string". Even in the case of  '.pl' a match can be found, even though the RE has to go all the way to the second character position to find it.

Re: Regex negative lookahead
by pvaldes (Chaplain) on Apr 26, 2014 at 10:02 UTC

    I am trying to print the files in the current directory that are not perl scripts

    Well, in fact not... you are trying to avoid the files in your directory that have a name ending in .pl, and this is a much different history. Your starting design is poor.

    what about .pm?, or .pl~? or im-a-perlscript?

    If you want to play seriously you need to look at the contents of the file, either use the program file (or a derivative of file) or read the first characters of your file and look for perl code