I am working on a large script to analyze code. I am trying to find key words in code, but not if they are in commented lines. Thus, if I am looking for foo, I want to match
maybe some stuff foo maybe some more stuff;
but not
// maybe some stuff foo maybe some more stuff;
This looks like an appropriate place to use a negative lookbehind assertion. I have not found the right form yet. Experimenting, I have tried
$a="abcdefghijklm"; 1 print "match" if ( $a =~ /(?<!cde).*?jkl/ ); # match because a do +es not match cde, then bcdefghi follows, then jkl matches? 2 print "match" if ( $a =~ /(?<=cde).*?jkl/ ); # match because cde +matches then fgh follws, then jkl matches $a="xyzfghijklm"; 1 print "match" if ( $a =~ /(?<!cde).*?jkl/ ); # match because cde +is not before jkl, then jkl matches 2 print "match" if ( $a =~ /(?<=cde).*?jkl/ ); # no match because c +de is not before jkl

I am looking for a regex that will not match abcdefghijklm because cde precedes jkl, but would match xyzfghijkl. I would have thought that something like the first one above would be what I wanted.

If I wanted it to not match when jkl must not be preceded by only c, the answer is easy: ^[^c]*jkl. But I have multiple character cde things I must exclude.

I am guessing that what I am looking for will have a (?<!cde) and then a jkl in it.

One trick I used several years ago, seems to work here. Since you cannot do variable length lookbehinds, I can reverse the strings and then do variable length lookaheads. i.e.

$a = "mlkjihgfedcba"; print "match" if ( $a =~ /lkj(?!.*?edc/ ); # no match because edc D +OES follow lkj. This is what is desired. (It does not match in the un +reversed when cde precedes jkl) $a = "mlkjihgfzyx"; print "match" if ( $a =~ /lkj(?!.*?edc/ ); # match because edc does + not follow lkj. Again, this is what I want. ( It does match jkl in t +he unreversed when not preceded by cde)

Is there another way?

I am guessing this is one of those areas where regexes come up short. I just learned about (?{ }) and (??{ }), but don't fully understand them yet. Where is a good place to learn advanced regex aside from perlre and Friedl?

I am limited to perl 5.6, if that makes a difference. I cannot use CPAN or any other libraries.

If this question seems disjoint, it is because my brain stores in hashes instead of sorted arrays.


In reply to Capture uncommented keywords by ExReg

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.