in reply to lookahead, lookbehind, ... I'm lost

A general rule of thumb is that a negative lookaround can never work if there is a .* or any other variable-length "general" pattern next to it. In your case, you have two such things, .* to the right and [^\n]+ to the left of (?!\sLib\s). You can easily check how Perl matched your strings against the regular expression by printing out the match variables:

while (<DATA>) { / ( #Start capture ( #Start Sub descript header '\*{20,} #start '------ delimiter (\n'[^\n]*?)+\n #middle of header '\*{20,}\n #end '------ delimiter )? #End Sub header (optional) (Private\s|Public\s|Friend\s)? #Scope (optional) (Static\s)? #Static (optional) (Sub\s|Function\s) #Sub or Function (mandatory) [^\n]+ #more stuff on same line (?!\sLib\s) #but not if Lib on line .* #and the rest of the file ) #End capture /sx and print "[$6/$7/$8]\n"; }; __DATA__ Private Declare Function DeleteFile Lib "kernel32" Alias "DeleteFileA" + _

One approach to make your parser more robust is to make it more specific, like explicitly parsing out the function/sub name and expecting (or rather, denying) the Lib keyword immediately after the function name:

while (<DATA>) { print; / ( #Start capture ( #Start Sub descript header '\*{20,} #start '------ delimiter (\n'[^\n]*?)+\n #middle of header '\*{20,}\n #end '------ delimiter )? #End Sub header (optional) (Private\s|Public\s|Friend\s)? #Scope (optional) (Static\s)? #Static (optional) (Sub\s|Function\s) #Sub or Function (mandatory) (\w+)\s+ # sub name ((?!Lib\s)) #no Lib on line ([^\n]+) #more stuff on same line unless "Lib" (.*) #and the rest of the file ) #End capture /sx and print "[$6/$7/$8]"; }; __DATA__ Private Declare Function DeleteFile Lib "kernel32" Alias "DeleteFileA"

To be a bit more specific about the word "general" above, a negative lookahead will never work if there is a variable length quantifier next to it with a pattern that will also match (parts of) the phrase you want to avoid.

Replies are listed 'Best First'.
Re^2: lookahead, lookbehind, ... I'm lost
by ExReg (Priest) on May 07, 2009 at 16:39 UTC

    Thanks so much for your quick reply. Sorry it took me so long to get back. I had made a few mistakes, and it doesn't take much to screw up a RegEx. I put in a bit more detail, like capturing the fucntion name and parameters. A bit more screaming at it until a slash turned into a backslash, and it finally worked.

    use strict; undef $/; while (<DATA>) { print; / ( #Start capture ( #Start Sub descript header '-{20,} #start '------ delimiter (\n'[^\n]*?)+\n #middle of header '-{20,}\n #end '------ delimiter )? #End Sub header (optional) (Private\s|Public\s|Friend\s)? #Scope (optional) (Static\s)? #Static (optional) (Sub\s|Function\s) #Sub or Function (mandatory) (\w+) #Sub name ( #Start Params \( #Left paren ([^\)])* #Optional params inside \) #Right paren )? #End Params (optional) \s+ #Space ((?!Lib\s)) #no Lib on line ([^\n]+) #more stuff on same line unless "Lib +" (.*) #and the rest of the file ) #End capture /sx; }; __DATA__ Private thingy as String Private Declare Function DeleteFile Lib "kernel32" Alias "DeleteFileA" '----------------------------------------------------------- ' This is a header '----------------------------------------------------------- Private Function foo(byVal x as Long, byVal y as String) as Integer

    It correctly ignores the first three lines and starts capturing at the '-----------. Thanks!