biocc has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!

I am more into C than perl - (sorry, but I have to) - and feel like I need the power of perl now.

I have a large set of C source files, and need to empty out all the function bodies - but just the bodies. Everything else should be left untouched.

Does anybody know a quick way to do that?

I read the C::Scan library and it can extract quite a lot of structures out of a c file. The thing is, not to delete anything, except the function bodies. Ok, it offers structs like:

defines_args fdecls, etc. and one would be able to print those. But the order is important, and the printed output should include everything.

I need to change this:

#define IN_LIBXML #include "libxml.h" struct _xmlLink { struct _xmlLink *next; struct _xmlLink *prev; void *data; }; static int xmlLinkCompare(const void *data0, const void *data1); static int xmlLinkCompare(const void *data0, const void *data1) { if (data0 < data1) return (-1); else if (data0 == data1) return (0); return (1); }

-----------

into:

#define IN_LIBXML #include "libxml.h" struct _xmlLink { struct _xmlLink *next; struct _xmlLink *prev; void *data; }; static int xmlLinkCompare(const void *data0, const void *data1); static int xmlLinkCompare(const void *data0, const void *data1) { }

Any suggestions??

Frank

Replies are listed 'Best First'.
Re: empty out C function bodies
by jakobi (Pilgrim) on Oct 24, 2009 at 12:51 UTC

    If we've permission to mangle the layout of the source with say GNU indent (so we've a canonical format and don't need to really understand C syntax), then try a more elaborate version of the following quick hack:

    perl -e 'undef $/; $ENV{f}=$ARGV[0]; $_=`cat -- "\$f"`; s/^(\w+[^\n;]*?\([^\n;]*\n\{\n)[\s\S]+?\n(\}\n)/$1$2/mg; print' FILE.c > FILE_MODIFIED.c

    This one-liner slurps the whole file into $_ using a mostly useless cat (or maybe type), then matches a line starting with a word and also containing a parenthesis, followed by a line with a sole { in col1 and non-gready eating until a line with a sole } in col1, in both cases w/o blanks. Use of ^ and /mg instead of \n is required in case multiple function defs occuring w/o empty lines in between.

    You can also push the selection of files into Perl (-> glob), as well as reading and writing the modified files (explicitely or with an implicit -> perl -i.bak).

    And first try it on a copy of your files at least until both you and the compiler are happy with the output again :)

    cu & HTH, Peter -- hints may be untested unless stated otherwise; use with caution & understanding.

    Updates: biocc's missing cases:

    That was the reason for asking about indent (not that indent wouldn't normally line-break long signatures, but we might hold-out and hope for a parse-friendly option-combination...) & and for my insistence on column 1 for {,}.

    It might be easier to smash the source to make it conform to my assumption than to 'harden' the regex. And you probably should stop short of reimplementing the C parser anyway.

    s/[\t ]*$//mg; to ensure no (ASCII) whitespace at EOL

    s/^(\w+[^\n;]*?\([^\n;]*(?:,\n[\t ]+[^\n;]+)*\)[\t\n ]*{\n)[\s\S]+?\n(\}\n)/$1$2/mg; should do the trick for the two cases you mentioned, requiring the comma at EOL and whitespace at SOL for continuations in multiline signatures.

    But this regex begins to be overly cute, so you should probably rewrite it using the /x modifier (-> add comments and whitespace), and maybe split the patterns into multiple separate variables (see perlre).

    Extend the regex repeatedly like this, and you've found an indicator that you should have chosen some cpan module or a proper C parsing grammar :). Let me rephrase that in better words than mine:

    Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. (monkquips)
      perl -e 'undef $/; $ENV{f}=$ARGV[0]; $_=`cat -- "\$f"`;

      Backquotes in scalar context already do what you want regardless of the contents of $/ and you can use $ARGV[0] directly instead of copying it to the environment, so that then becomes:

      perl -e ' $_=`cat -- "$ARGV[0]"`;

      You could also simplify that whole thing with the use of the -0 switch and the -p switch:

      perl -0777pe's/^(\w+[^\n;]*?\([^\n;]*\n\{\n)[\s\S]+?\n(\}\n)/$1$2/mg' +FILE.c > FILE_MODIFIED.c
        1. Wrong. There's a very important reason for this idiom: you've forgotten the shell's interpolation (might be Unix specific, but it's nonetheless a DEADLY & EASILY EXPLOITABLE TRAP).
          !!!!Please do not do insecure shell invocations like $_=`cat "$ARGV[0]"` ever!!!!
          (unless you control each and every tenth of each bit of each filename character and shell word individually; in case you missed it, it's indeed a major pet peeve of mine. Why you ask: consider rm -rf /* ./* and reinstalls & restores all over the place in huge lans. I don't intend to try "overnight" bare-metal recovery on that order of magnitude, and neither should you)
        2. switches: indeed. But even if we start playing golf, I'm still rather partial to my personal one and only space between -e and the Perl scrap: I greatly fear that you'll win by default :).
      Hi Peter!

      Thanks for the answer!

      Layout is not important. Your regex works in most cases. But it misses:

      static void xmlLinkDeallocator(xmlListPtr l, xmlLinkPtr lk) { (lk->prev)->next = lk->next; (lk->next)->prev = lk->prev; if(l->linkDeallocator) l->linkDeallocator(lk); xmlFree(lk); }

      and

      static void xmlLinkDeallocator(xmlListPtr l, xmlLinkPtr lk) { (lk->prev)->next = lk->next; (lk->next)->prev = lk->prev; if(l->linkDeallocator) l->linkDeallocator(lk); xmlFree(lk); }

      Would it be possible to produce the hack for these two cases?

Re: empty out C function bodies
by GrandFather (Saint) on Oct 24, 2009 at 20:23 UTC

    Is it important that the processing correctly handle preprocessor directives? Handling #define and #ifxxx can be interesting!

    I've several times started work on a Pure Perl C/C++ preprocess parser, but it's a tricky problem even just handling fairly normal code. Handling edge cases in the same way as a range of different C/C++ compilers is just plain nasty. In some cases there isn't well defined behaviour and each compiler's preprocessor may generate different results.


    True laziness is hard work