patgas has asked for the wisdom of the Perl Monks concerning the following question:

On a whim, I decided to sift through all my code and see which modules and pragma's I used/required the most. So I came up with this regex:

m,(use|require)[\s\+]+([\w:]+)\s*[^;]*;,igs

But I can't figure out how to ignore commented lines. I've been trying to figure out negative lookbehinds, cause that seems to be the thing to do.

m,(?<!- )(use|require)[\s\+]+([\w:]+)\s*[^;]*;,igs

That's not working for me, though. How would I go about doing this? Already this is by far the toughest regex I've ever had to write...

"We're experiencing some Godzilla-related turbulence..."

Replies are listed 'Best First'.
Re: Is this the time for a negative lookbehind?
by dws (Chancellor) on Dec 20, 2001 at 00:31 UTC
    I can't figure out how to ignore commented lines. I've been trying to figure out negative lookbehinds, cause that seems to be the thing to do.

    The lookbehind you show is for the string "!-". Looks like you might be confusing HTML comments for Perl ones.

    I don't think you need a lookbehind.   m,^[^#]*(use|require)\s+([\w:]+)\s*[^;]*;,igm should do the trick. I added the 'm' modifier so that ^ would match inside of a string, and removed the 's' modifier so that '.' wouldn't match \n.

    Note that won't correctly handle   require 5.003; If that matters, the change is left as an exercise.

(tye)Re: Is this the time for a negative lookbehind?
by tye (Sage) on Dec 20, 2001 at 03:42 UTC

    Negative look-behinds have to be fixed-width so you can't use them here because the number of characters in front of "use|require" before you hit the "#" that you don't want to match isn't going to be constant.

    I don't understand why you want to match "use++this" so perhaps you want to replace [\s\+]+ with simply \s+ ?

    I don't see a lot of point in matching \s*[^;]*;. You can certainly shorten it to [^;]*; or .*?; (the second form can be a problem if you later add to the end of the regex). But I'd probably just drop it and allow for the ";" being dropped for various reasons.

    The suggestion of /^[^#]*(use|require)... is pretty good but I'd at least add \b and prohibit quoted strings in in front: /^[^#'"q]*\b(use|require)... I'd might even just assume nothing but whitespace in front /^\s*(use|require)... but worry that someone might write BEGIN { require... for some reason.

    Note also that you can require "config.pl" or even require $x, but those probably aren't the type of stuff you want to report anyway. (:

            - tye (but my friends call me "Tye")
Re: Is this the time for a negative lookbehind?
by Juerd (Abbot) on Dec 20, 2001 at 00:45 UTC
    dws' answer is sufficient for solving your problem, but I'd like to add that you could increase speed a little by taking out the /i: use and require will have to be lower cased, and \w already is [A-Za-z0-9_].
    If you don't use $1, you can write (?:use|require) instead of (use|require), for a little extre speed-up. Note that the current $2 will be $1 if you do so.

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Re: Is this the time for a negative lookbehind?
by merlyn (Sage) on Dec 20, 2001 at 06:42 UTC
Re: Is this the time for a negative lookbehind?
by patgas (Friar) on Dec 20, 2001 at 00:59 UTC

    If it helps, here's my test program. Still haven't found a solution to get all of them, but your comments are helping... Thanks!

    Update: I added some more test cases...

    #!/usr/bin/perl -w use strict; local $/ = undef; my $file = <DATA>; while ( $file =~ # m,(use|require)[\s\+]+([\w:]+)\s*[^;]*;,igs # m,(?:use|require)\s+([\w:]+)\s*[^;]*;,igs # m,^[^#]*(use|require)\s+([\w:]+)\s*[^;]*;,igm m,^(?:use|require)\s+([\w:.]+)\s*[^;]*;,gms ) { printf( "%-20s %-20s %-20s\n", $1 || '*', $2 || '*', $3 || '*' ); } __DATA__ use one; require two; use three; # use four; # this is why you should use five. use Six::Maybe; require Seven.As.Well; require Eight ; # haha! use # nine; just kidding Nine;

    "We're experiencing some Godzilla-related turbulence..."

Re: Is this the time for a negative lookbehind?
by tradez (Pilgrim) on Dec 20, 2001 at 01:28 UTC
    I may not be up to "Perlmonk" status yet. So I am assuming I am missing the problem, but if I wanted to test to make sure something didn't start with something I would use a little boolean if logic, like so :
    if ($string !~ /^#/) { print "Houston, we have an uncommented line \n"; }
    Please do tear me apart if I completely missed the issue, only way a young buck is going to learn.