extract C function body

huaihai has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I wonder if tehre is already a solution to my task out here. To help instrument my c code, I want to run a perl script which automatically scans the source file, and separates each function definition, maybe putting them into arrays. I am baffled as to what kind of regex will give me such results. In particular, the source file can be badly formated, and the only way the extract a whole function body is by counting the number of matching {, and }. Could anyone point to a resource for doing that?

Comment on extract C function body

Replies are listed 'Best First'.
Re: extract C function body by Fletch (Bishop) on Sep 07, 2006 at 13:00 UTC
If you can guarantee a fairly rigid formatting you might can get away with using regexen to handle the job. There's also the C::Scan module which I think is what Inline uses (and if it's not, look at Inline::C and see what it does).	[reply]
Re: extract C function body by planetscape (Chancellor) on Sep 07, 2006 at 15:36 UTC
Granted, this may not be at all what you are looking for, but depending on your needs, it might be just the trick. Doxygen generates JavaDoc-like documentation for projects in many languages, including C and C++. DoxygenFilter is a new take on an old project (DoxyFilt) that also handles ("filters") Perl code, and now promises support for multi-programming-language projects. See Examples of output generated by doxygen. HTH, planetscape	[reply]
Re: extract C function body by ikegami (Patriarch) on Sep 07, 2006 at 16:11 UTC
The functions in Text::Balanced should do a decent job.	[reply]
Re: extract C function body by wojtyk (Friar) on Sep 07, 2006 at 15:19 UTC
I wrote a C/C++ parser that does what you say...increments depth count when a { is reached and decrements on }: `$depth++ while /\{/g; $depth-- while /\}/g;` [download] If depth == 0, the line is tested using a regex that matches function declarations. I used the following, although I'm unsure how accurate it matches us to the actual grammar (I rolled it in my head): `my $funcrgx = '((\w+(?:\:\:\w+))\s($[^)]*$))';` The one thing you have to be careful of using this method is preprocessor crap. It can throw the depth count off. I wrote in some fuzzy handling code to take care of that (basically always picking the first branch of the #ifdef to follow and not counting parens in the other branch) Using established parsing modules is probably preferrable to rolling it yourself, but this is how I did it :)	[reply] [d/l] [select]