http://qs1969.pair.com?node_id=45438

Carl-Joseph has asked for the wisdom of the Perl Monks concerning the following question:

On page 72, the book Effective Perl Programming shows some code to remove C-style comments from a string. I've reproduced the code below.

local $_; $_ = " This_Is_Code = 0; /* This is a comment */ /* * Can it handle a mult-line comment, i.e. * one that goes on for more that one * line? Yes, it can. */ This_Is_Code++; "; for (split m!("(:?\\\W|.)*?"|/\*|\*/)!) { if ($in_comment) { $in_comment = 0 if $_ eq "*/" } else { if ( $_ eq "/*" ) { $in_comment = 1; print " "; } else { print; } } }

I have a question about the regular expression that's used in the call to split(). The expression (:?\\\W|.) seems to say match either a non-word character, or any character. Why have both patterns in the alternation when the dot includes non-word characters?

Thanks,

Carl-Joseph

Replies are listed 'Best First'.
Re: Alternation in Effective Perl Programming Example
by chromatic (Archbishop) on Dec 07, 2000 at 09:03 UTC
    It's a little more than that. Should it be (?:\\\W|.) instead?

    If so, as written, it would be interpreted as non-capturing parenthesis, matching either a backslash and a nonword character or a single character.

    Otherwise, it's (an optional colon followed by a backslash and a non-word character) or (a single character).

      Appears to be (looking at Effective perl) a typo, as the chapter leans heavily on (?: just before the example. You should mention it to merlyn as its not in the errata (neither is the misspelled 'Eart' on page 64 ;-)

      a

        And, to the original question: I think the idea is to 'inch along'
        for (split m!("(?:\\\W|.)*?"|/\*|\*/)!) {
        will return the pieces ending w/ an escaped non-word char, a single char or the comment begin/end markers. Hmm, an escaped non-word char or a single char in quotes? No, as many \\\w|. as found between quotes. I guess so:
        biff = "not a comment /*"; /* marker for not a comment */
        won't start $in_comment too early.

        a