In the process of trying to emulate the C pre-processor I had major trouble trying to handle C style /* ... */ comments. There are two issues that cause particular grief - comments can span lines and, at least for some compilers, comments can be nested (and are in the code I need to handle).

An additional gotcha is that things that look like comments in strings need to be retained.

The code below parses an input string and generates an output string comprising the original text sans C style comments. Note that it leaves C++ single line comments however - but they are easily dealt with in the second pass.

use strict; use warnings; use Parse::RecDescent; my $decommendedText = ''; sub concat ($) {$decommendedText .= $_[0]; 1;} my $decomment = <<'GRAMMAR'; file : block(s) block : string {::concat ($item{string}); 1} | m{((?!/\*|"|').)+}s {::concat ($item[-1]); 1} | comment {::concat ($item{comment}); 1;} string : /"([^"]|\\")*"/ {$return = $item[-1] . ($text =~ /^\n/ ? "\n" : ''); 1;} | /'([^']|\\')*'/ {$return = $item[-1] . ($text =~ /^\n/ ? "\n" : ''); 1;} comment : '/*' commentBlock '*/' {$return = $text =~ /^\n/ ? "\n" : ''; 1;} commentBlock : m{((?! \*/ | /\* ).)*}sx comment m{((?! \*/ | /\* ). +)*}sx {$return = "\n"; 1;} | m{((?! \*/ | /\* ).)+}sx {$return = ''; 1;} GRAMMAR my $parse = new Parse::RecDescent ($decomment); my $input = <<'DATA'; #include "StdAfx.h" // Tail comment #include "Utility\perftime.h" #pragma hdrstop /* Comment before MACRO */ /* Comment /* and nested comment */ lines */ #define MACRO 10\ + 3 // Multi line macro with comment #define __DEBUG /* comment */ 1 #define STRING 'This is a string' /* comment */ #define COMMENT "/* comment in \"a\" string */" // c++ comment line /* Comment at start for a number of lines */ /* multi-line comment /* nested */ block */ // cpp block char PerfTimer::Buf[64]; DATA $parse->file($input) or die "Parse failed\n"; print $decommendedText;

Prints:

#include "StdAfx.h"// Tail comment #include "Utility\perftime.h" #pragma hdrstop #define MACRO 10\ + 3 // Multi line macro with comment #define __DEBUG 1 #define STRING 'This is a string' #define COMMENT "/* comment in \"a\" string */" // c++ comment line // cpp block char PerfTimer::Buf[64];

DWIM is Perl's answer to Gödel

In reply to C comment stripping preprocessor by GrandFather

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.