in reply to C/C++ function parsing

G'day ssriganesh,

Welcome to the monastery.

To match the characters between the parentheses, I'd use '[^)]*' which matches any character except the closing parenthesis because '.' matches any character including a closing parenthesis.

Working with the limited example input you've provided, here's how I might have coded this. I've included additional code to join a captured multi-line: not stated, but that may be what you want.

#!/usr/bin/env perl -l use strict; use warnings; my $f1 = 'static gboolean g::ber_read(wtap *wth, int *err)'; my $f2 = 'static void ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size)'; for ($f1, $f2) { print "Function: $_"; /([:\w]+\([^)]*\))/m; print "Captured: $1"; # if you want multi-line reduced to single-line: (my $no_newlines = $1) =~ y/\n/ /; print "Lines joined: $no_newlines"; }

Output:

Function: static gboolean g::ber_read(wtap *wth, int *err) Captured: g::ber_read(wtap *wth, int *err) Lines joined: g::ber_read(wtap *wth, int *err) Function: static void ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size) Captured: ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size) Lines joined: ber_set_pkthdr(struct wtap_pkthdr *phdr, int packet_size +)

-- Ken

Replies are listed 'Best First'.
Re^2: C/C++ function parsing
by ssriganesh (Initiate) on Jan 13, 2014 at 08:14 UTC
    HI Ken,

    Thanks for the quick response.

    The solution helped me only little, but I have learnt how to exclude some characters :)..

    The code which I have written for parsing is as below.

    May be with this you could help me out in a better way..

    my @matches2; my $sw_test; my $func_name; my $func_start; my $func_full; my $print; my $count=0; Functionparser(); sub FunctionParser { my $linenum =0; open FILE2, $ARGV[0] or die $!; while(<FILE2>) { $linenum++; if(!$sw_test && (/([:\w]+)\((.*\))/) ) { print "first\n"; $func_name=$1; $func_start=$.; $func_full= $_; $sw_test=1; print "$linenum) $_\n"; $linenum1 = $linenum; $print=1; next; } $count++ if $sw_test && /\{/; $count-- if $sw_test && /\}/; if($sw_test && $count==0) { if($print) { push(@matches2, { 'Start' => $linenum1, 'End' => $line +num, 'FuncName' => $func_full}); $print=0; } $sw_test=0; } } close FILE2; }

      I would make 2 suggestions here. First, at the end of your regex add the "/s" modifier to make the regex look at multiline output as single line. Like so:

      if(!$sw_test && (/([:\w]+)\((.*\))/s) )
      Second, google "The Regex Coach", download, install, and use it. I am no expert at Regex at all, so this software has saved my sanity time and again! It's not perl specific, but it always works in my code. Basically, you can input your data (one or more lines) you want to regex against in the 2nd window in the GUI, then use the top window to test your regex in real time (you enter everything you would ordinarily enter between the slashes).

      When I run your examples through TRC, your regex code works fine with the addition of "/s" as noted above.

      ImJustAFriend

      "May be with this you could help me out in a better way."

      You'll need to specify what you want help with; an explanation of what you mean by "a better way" would also be useful. You should provide: some sample, representative input; actual and expected output (as well as an indication of where you're encountering difficulties getiing the actual output to match the expected output); any error and warning messages; along with anything else that helps to describe the problems you are encountering. All of this is explained in the guidelines "How do I post a question effectively?".

      Here's some points on the code you've currently posted:

      • You haven't used the strict or warnings pragmata. Unless you have a very good reason not to use these, you should include them in all your scripts. See the code I posted in my first reply to you.
      • You've declared a number of lexical variables whose scope is the entire script but they're only used in Functionparser() — you should limit their scope to this subroutine.
      • You should get into the habit of using the 3-argument form of open with a lexical filehandle. Using a package filehandle (such as FILE2 used here) can cause issues if the same name is used anywhere else in your script; a lexical filehandle (e.g. $fh) would have its scope limited to Functionparser() — another $fh used elsewhere in your script would be a completely different filehandle.
      • Using $ARGV[0] in Functionparser() limits that subroutine. At some later time you may want to parse multiple files. Use something like "my ($source_code) = @_;" as the first line in that subroutine: it will then be more generalised and you can call it as many times as you want allowing multiple files to be parsed.
      • Consider using the autodie pragma or using more descriptive error output than "die $!". The former only requires a single "use autodie;" near the start of your code; the latter is a lot more work and error-prone. I usually choose autodie unless I have a very good reason not to.
      • You seem to understand what '$.' does (you've used "$func_start = $.") but you've also introduced $linenum to provide identical functionality (which requires additional processing, i.e. "$linenum++"). Given that '$.' is already available, it would be easier to just use that. Also note that you don't use $func_start after assigning a value to it.
      • You have retained the original regex which was the basis of your OP question and that you said didn't work. Why? That regex contains two captures: you ignore the second (i.e. $2).
      • Your use of next looks wrong. If the if block (with the next) is entered, the remaining code in the while loop will not be executed; if that if block is not entered, the remaining code in the while loop will still not be executed as all conditions checking $sw_test will be false. Those parts of the while loop may be executed on subsequent iterations; however, that looks rather flakey to me. Perhaps when you provide sample input, your intent may become clearer.

      -- Ken

        Hi Ken,

        Thanks for the feedback, I would incorporate it while posting questions.

        The Code which I had posted is the partial code/sub routine code which is used, and the reason why the strict and warnings are missing.

        Some things like autodie and FILE2, i would incorporate in my present code

        In a single line the problem statement is:

        "how do I parse the data which spans multiple lines."

        More about about the question, I am writing a parser for c/c++ functions in the perl, which can detect the c/c++ functions which are spanning multiple lines, as in example

        Function declared in multiple lines

        NS_IMETHOD HandleEvent(nsPresContext* aPresContext, nsGUIEvent* aEvent, nsEventStatus* aEventStatus);

        Function declared in Single Line

        nsHTMLFramesetFrame::nsHTMLFramesetFrame(nsStyleContext* aContext)

        The code I posted in my earlier post is able to parse the functions declared in a single line, but fails to detect the functions declared in multiple lines.

        I need help in modifying the perl code to parse multiple line functions in a .cpp /.c file