in reply to RegEx - comments should not matches

Weeding out comments is not as easy as you might think:

println( "This starts a comment? // Or does it\n" ); <== match ? // /* print( "*/ What happens now ?\n" ); <== match ??

You will need a parser for your language that understands the basics of that language at least well enough to know when the "comment starter markers" are within a string and when they actually apply. There is Text::Balanced and Regexp::Common, which have prefabricated regexes that attempt this task, and if they are not suitable for the language you are trying to parse, Parse::RecDescent or Parse::YAPP can be used to write your own parser.

Replies are listed 'Best First'.
Re^2: RegEx - comments should not matches
by Chief of Chaos (Friar) on Jul 13, 2004 at 11:55 UTC
    Thanks,
    i will try Text::Balanced and Regexp::Common.

    Sounds good :
    Regexp::Common::comment Provides regexes for comments of various languages (43 languages c +urrently).

    Greetings,
    CoC
Re^2: RegEx - comments should not matches
by hardburn (Abbot) on Jul 13, 2004 at 16:13 UTC

    Regexp::Common::comment doesn't do proper tokenizing. IIRC, it will not catch the cases you presented correctly. I remember seeing a regex for catching C-style comments correctly (don't remember where, sorry), and it's quite ugly (but not as bad as the e-mail address regex).

    ----
    send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.