ssriganesh has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am trying to parse the C/C++ code functions declarations. All is well till I have the function declaration in a single line.

But when the declarations spans more than a line regex fails to detect those functions.

The regex I am using is as below.

if(!$test && (/([:\w]+)\((.*\))/) )

Can I get some help me in this parsing.

Detected function : static gboolean g::ber_read(wtap *wth, int *err)

Not Detected function: static void ber_set_pkthdr(struct wtap_pkthdr *phdr,

int packet_size)

Replies are listed 'Best First'.
Re: C/C++ function parsing
by AnomalousMonk (Archbishop) on Jan 12, 2014 at 21:24 UTC

    The regex  /([:\w]+)\((.*\))/ matches against  $_ the default scalar. So, what's in $_ ? If it's just a single line from the source file, it will be tricky to parse a multi-line declaration. If it's the entire file (the 'slurped' file), remember that the  . (dot) regex operator matches everything except a newline unless the regex is used with the  /s "dot matches all" regex modifier. See Modifiers in perlre.

Re: C/C++ function parsing
by kcott (Archbishop) on Jan 13, 2014 at 04:39 UTC

    G'day ssriganesh,

    Welcome to the monastery.

    To match the characters between the parentheses, I'd use '[^)]*' which matches any character except the closing parenthesis because '.' matches any character including a closing parenthesis.

    Working with the limited example input you've provided, here's how I might have coded this. I've included additional code to join a captured multi-line: not stated, but that may be what you want.

    #!/usr/bin/env perl -l use strict; use warnings; my $f1 = 'static gboolean g::ber_read(wtap *wth, int *err)'; my $f2 = 'static void ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size)'; for ($f1, $f2) { print "Function: $_"; /([:\w]+\([^)]*\))/m; print "Captured: $1"; # if you want multi-line reduced to single-line: (my $no_newlines = $1) =~ y/\n/ /; print "Lines joined: $no_newlines"; }

    Output:

    Function: static gboolean g::ber_read(wtap *wth, int *err) Captured: g::ber_read(wtap *wth, int *err) Lines joined: g::ber_read(wtap *wth, int *err) Function: static void ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size) Captured: ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size) Lines joined: ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size +)

    -- Ken

      HI Ken,

      Thanks for the quick response.

      The solution helped me only little, but I have learnt how to exclude some characters :)..

      The code which I have written for parsing is as below.

      May be with this you could help me out in a better way..

      my @matches2; my $sw_test; my $func_name; my $func_start; my $func_full; my $print; my $count=0; Functionparser(); sub FunctionParser { my $linenum =0; open FILE2, $ARGV[0] or die $!; while(<FILE2>) { $linenum++; if(!$sw_test && (/([:\w]+)\((.*\))/) ) { print "first\n"; $func_name=$1; $func_start=$.; $func_full= $_; $sw_test=1; print "$linenum) $_\n"; $linenum1 = $linenum; $print=1; next; } $count++ if $sw_test && /\{/; $count-- if $sw_test && /\}/; if($sw_test && $count==0) { if($print) { push(@matches2, { 'Start' => $linenum1, 'End' => $line +num, 'FuncName' => $func_full}); $print=0; } $sw_test=0; } } close FILE2; }

        I would make 2 suggestions here. First, at the end of your regex add the "/s" modifier to make the regex look at multiline output as single line. Like so:

        if(!$sw_test && (/([:\w]+)\((.*\))/s) )
        Second, google "The Regex Coach", download, install, and use it. I am no expert at Regex at all, so this software has saved my sanity time and again! It's not perl specific, but it always works in my code. Basically, you can input your data (one or more lines) you want to regex against in the 2nd window in the GUI, then use the top window to test your regex in real time (you enter everything you would ordinarily enter between the slashes).

        When I run your examples through TRC, your regex code works fine with the addition of "/s" as noted above.

        ImJustAFriend

        "May be with this you could help me out in a better way."

        You'll need to specify what you want help with; an explanation of what you mean by "a better way" would also be useful. You should provide: some sample, representative input; actual and expected output (as well as an indication of where you're encountering difficulties getiing the actual output to match the expected output); any error and warning messages; along with anything else that helps to describe the problems you are encountering. All of this is explained in the guidelines "How do I post a question effectively?".

        Here's some points on the code you've currently posted:

        • You haven't used the strict or warnings pragmata. Unless you have a very good reason not to use these, you should include them in all your scripts. See the code I posted in my first reply to you.
        • You've declared a number of lexical variables whose scope is the entire script but they're only used in Functionparser() — you should limit their scope to this subroutine.
        • You should get into the habit of using the 3-argument form of open with a lexical filehandle. Using a package filehandle (such as FILE2 used here) can cause issues if the same name is used anywhere else in your script; a lexical filehandle (e.g. $fh) would have its scope limited to Functionparser() — another $fh used elsewhere in your script would be a completely different filehandle.
        • Using $ARGV[0] in Functionparser() limits that subroutine. At some later time you may want to parse multiple files. Use something like "my ($source_code) = @_;" as the first line in that subroutine: it will then be more generalised and you can call it as many times as you want allowing multiple files to be parsed.
        • Consider using the autodie pragma or using more descriptive error output than "die $!". The former only requires a single "use autodie;" near the start of your code; the latter is a lot more work and error-prone. I usually choose autodie unless I have a very good reason not to.
        • You seem to understand what '$.' does (you've used "$func_start = $.") but you've also introduced $linenum to provide identical functionality (which requires additional processing, i.e. "$linenum++"). Given that '$.' is already available, it would be easier to just use that. Also note that you don't use $func_start after assigning a value to it.
        • You have retained the original regex which was the basis of your OP question and that you said didn't work. Why? That regex contains two captures: you ignore the second (i.e. $2).
        • Your use of next looks wrong. If the if block (with the next) is entered, the remaining code in the while loop will not be executed; if that if block is not entered, the remaining code in the while loop will still not be executed as all conditions checking $sw_test will be false. Those parts of the while loop may be executed on subsequent iterations; however, that looks rather flakey to me. Perhaps when you provide sample input, your intent may become clearer.

        -- Ken

Re: C/C++ function parsing
by educated_foo (Vicar) on Jan 13, 2014 at 16:56 UTC
    What you want will depend on what you mean by "parsing". If you want to actually pull out the separate parameters, and especially if you need to handle templates, the problem gets complicated. I'd look into just pulling code out of Inline::CPP or SWIG, since they can both handle fairly hairy declarations.

    If you just need to recognize the declarations in a file, and don't care about templates, you can get away with something much simpler along the lines of what you already have, e.g. /^([\w:]+)\(([^)]*)\)/s