http://qs1969.pair.com?node_id=292588

jai has asked for the wisdom of the Perl Monks concerning the following question:

hi,
I have to write a perl program that lists out all the functions in a given C source file & prints the functions called by each of them. To match the called functions, I first had the regex
(/([a-zA-Z][a-zA-Z0-9_-]*)\s*\([^)]*\)/)
.But this failed to match the following:
if((err=func1(a,b))!=0).
So I modified it to
(/([a-zA-Z][a-zA-Z0-9_-]*)\s*\([^)(]*\)/).
Still, it fails for functions called like this.
if((err=func1(a,b,(*c)()).
Is there any way to reliably parse them ? any help would be greatly appreciated.
jai

Replies are listed 'Best First'.
Re: regex 2 find C function dependencies
by kvale (Monsignor) on Sep 19, 2003 at 05:44 UTC
    I think it would be difficult to distinguish functions from macros from conditional expressions on the basis of that or similar regexps. What you need to do is preprocess and parse the the C language. Here is a location for the grammar .

    To parse C in perl, one could use regexps to lex C code into tokens and Parse::RecDescent to parse C tokens based on the above grammar.

    -Mark

Re: regex 2 find C function dependencies
by Abigail-II (Bishop) on Sep 19, 2003 at 09:55 UTC
    It's not easy to do this with regexes. You could do something like:
    my $bal; $bal = qr /[(]((?:(?>[^()]+)|(??{$bal}))*)[)]/; my $func = qr /([a-zA-Z]\w*)\s*$bal/; while (<>) { while (/$func/g) { print "Name: $1; arguments: $2\n" } }

    but that doesn't find all of them either. Beside the obvious problem of iterating over the lines (thus failing to find calls spanning a newline), it will report keywords that are followed by parens, like if and while. It won't find recursive calls, for instance, if you have

    foo (bar ())
    it will only report foo, and not bar. But the program will truly go haywire on calls like:
    foo ("("); bar (")))");
    . Comments are great fun too:
    /* This works (really!) */
    With great effort, this might be fixable, but why should you? With great effort, you would be able to replace your tire with toothpicks as well, but sometimes you have to acknowledge that other tools are better for the job at hand.

    Abigail

Re: regex 2 find C function dependencies
by Zaxo (Archbishop) on Sep 19, 2003 at 07:37 UTC

    Take a look at C::Scan. It's not perfect, but it mostly works.

    After Compline,
    Zaxo

Re: regex 2 find C function dependencies
by jmcnamara (Monsignor) on Sep 19, 2003 at 08:38 UTC

    For an alternative approach have a look at the cscope utility. It finds the functions called by a function as well as the functions calling a functions.

    It is generally used as a standalone program or from within an editor but it also produces a file containing the dependencies which should be easier to parse than the C code.

    --
    John.

Re: regex 2 find C function dependencies
by Cody Pendant (Prior) on Sep 19, 2003 at 06:32 UTC
    I have to write a perl program that lists out all the functions in a given C source file
    You have to? someone's specifically set you this task and told you to use Perl regular expressions to do it?

    Finding balanced text is the number one item on the list of "Things You Can't Do With Regular Expressions". Maybe someone's trying to make a point?



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
      Finding balanced text is the number one item on the list of "Things You Can't Do With Regular Expressions".

      You can with Perl regular expressions. man perlre even has an example.

      Abigail

        OK I stand corrected, but not 100% -- the perl 5.6 FAQ says
        Can I use Perl regular expressions to match balanced text? 

        Although Perl regular expressions are more powerful than "mathematical" regular expressions, because they feature conveniences like backreferences (\1 and its ilk), they still aren't powerful enough -- with the possible exception of bizarre and experimental features in the development-track releases of Perl. You still need to use non-regex techniques to parse balanced text, such as the text enclosed between matching parentheses or braces, for example.

        An elaborate subroutine (for 7-bit ASCII only) to pull out balanced and possibly nested single chars, like ` and ', { and }, or ( and ) can be found in http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz .

        The C::Scan module from CPAN contains such subs for internal usage, but they are undocumented.

        Whereas the 5.8 says

        Historically, Perl regular expressions were not capable of matching balanced text. As of more recent versions of perl including 5.6.1 experimental features have been added that make it possible to do this. Look at the documentation for the (??{ }) construct in recent perlre manual pages to see an example of matching balanced parentheses. Be sure to take special notice of the warnings present in the manual before making use of this feature.

        CPAN contains many modules that can be useful for matching text depending on the context. Damian Conway provides some useful patterns in Regexp::Common. The module Text::Balanced provides a general solution to this problem.

        One of the common applications of balanced text matching is working with XML and HTML. There are many modules available that support these needs. Two examples are HTML::Parser and XML::Parser. There are many others.

        An elaborate subroutine (for 7-bit ASCII only) to pull out balanced and possibly nested single chars, like ` and ', { and }, or ( and ) can be found in http://www.cpan.org/authors/id/TOMC/scripts/pull_quotes.gz.

        The C::Scan module from CPAN also contains such subs for internal use, but they are undocumented.

        So it's not like it says "sure, here's the trick (one line of code)" is it?

        I still think someone was trying to make a point by handing our friend this assignment.



        ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
Re: regex 2 find C function dependencies
by Anonymous Monk on Sep 19, 2003 at 06:55 UTC
    Is there any way to reliably parse them ? any help would be greatly appreciated.
    Yes. This problem has already been solved, so you should look for the solution before attempting to write one.
Re: regex 2 find C function dependencies
by perlmonkey (Hermit) on Sep 20, 2003 at 06:16 UTC
    I just went through a similar exercise myself. I did not find C::Scan to be useful, and Parse::RecDescent can do it, but probably too much work for your needs.

    You really want to use Regexp::Common here.

    Something like this should get you started:
    use Regexp::Common; my $func_rx = qr{ ([a-zA-Z]\w*) # match function name \s* # optional space ($RE{balanced}{-parens='()'}) # match parameter list }sx; while( $code =~ /$func_rx/g ) { print "func = $1\n"; print "param list = $2\n"; }

    This will match things like if( !foo ) though, so you will probably have to explicitly skip built-in keywords like that.

    Note that this is certainly not a complete solution for you, but just to point out Regexp::Common is a handy tool that should help.

    Also you probably want to strip out all the comments first, so they dont throw off your matching regex. But this can also be done with Regexp::Common:
    $code =~ s/$RE{comment}{'C++'}//g;