C/C++ function parsing

ssriganesh has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: C/C++ function parsing by AnomalousMonk (Archbishop) on Jan 12, 2014 at 21:24 UTC
The regex `/([:\w]+)$(.$)/` matches against `$_` the default scalar. So, what's in `$_` ? If it's just a single line from the source file, it will be tricky to parse a multi-line declaration. If it's the entire file (the 'slurped' file), remember that the `.` (dot) regex operator matches everything except a newline* unless the regex is used with the `/s` "dot matches all" regex modifier. See Modifiers in perlre.	[reply] [d/l] [select]
Re: C/C++ function parsing by kcott (Archbishop) on Jan 13, 2014 at 04:39 UTC
G'day ssriganesh, Welcome to the monastery. To match the characters between the parentheses, I'd use '`[^)]`' which matches any character except the closing parenthesis because '`.`' matches any character including a closing parenthesis. Working with the limited example input you've provided, here's how I might have coded this. I've included additional code to join a captured multi-line: not stated, but that may be what you want. `#!/usr/bin/env perl -l use strict; use warnings; my $f1 = 'static gboolean g::ber_read(wtap wth, int err)'; my $f2 = 'static void ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size)'; for ($f1, $f2) { print "Function: $_"; /([:\w]+$[^)]$)/m; print "Captured: $1"; # if you want multi-line reduced to single-line: (my $no_newlines = $1) =~ y/\n/ /; print "Lines joined: $no_newlines"; }` [download] Output: `Function: static gboolean g::ber_read(wtap wth, int err) Captured: g::ber_read(wtap wth, int err) Lines joined: g::ber_read(wtap wth, int err) Function: static void ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size) Captured: ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size) Lines joined: ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size +)` [download] -- Ken	[reply] [d/l] [select]
Re^2: C/C++ function parsing by ssriganesh (Initiate) on Jan 13, 2014 at 08:14 UTC
HI Ken, Thanks for the quick response. The solution helped me only little, but I have learnt how to exclude some characters :).. The code which I have written for parsing is as below. May be with this you could help me out in a better way.. my @matches2; my $sw_test; my $func_name; my $func_start; my $func_full; my $print; my $count=0; Functionparser(); sub FunctionParser { my $linenum =0; open FILE2, $ARGV[0] or die $!; while(<FILE2>) { $linenum++; if(!$sw_test && (/([:\w]+)$(.*$)/) ) { print "first\n"; $func_name=$1; $func_start=$.; $func_full= $_; $sw_test=1; print "$linenum) $_\n"; $linenum1 = $linenum; $print=1; next; } $count++ if $sw_test && /\{/; $count-- if $sw_test && /\}/; if($sw_test && $count==0) { if($print) { push(@matches2, { 'Start' => $linenum1, 'End' => $line +num, 'FuncName' => $func_full}); $print=0; } $sw_test=0; } } close FILE2; } [download]	[reply] [d/l]
Re^3: C/C++ function parsing by ImJustAFriend (Scribe) on Jan 13, 2014 at 12:56 UTC
I would make 2 suggestions here. First, at the end of your regex add the "/s" modifier to make the regex look at multiline output as single line. Like so: `if(!$sw_test && (/([:\w]+)$(.*$)/s) )` [download] Second, google "The Regex Coach", download, install, and use it. I am no expert at Regex at all, so this software has saved my sanity time and again! It's not perl specific, but it always works in my code. Basically, you can input your data (one or more lines) you want to regex against in the 2nd window in the GUI, then use the top window to test your regex in real time (you enter everything you would ordinarily enter between the slashes). When I run your examples through TRC, your regex code works fine with the addition of "/s" as noted above. ImJustAFriend	[reply] [d/l]
Re^3: C/C++ function parsing by kcott (Archbishop) on Jan 13, 2014 at 13:41 UTC
"May be with this you could help me out in a better way." You'll need to specify what you want help with; an explanation of what you mean by "a better way" would also be useful. You should provide: some sample, representative input; actual and expected output (as well as an indication of where you're encountering difficulties getiing the actual output to match the expected output); any error and warning messages; along with anything else that helps to describe the problems you are encountering. All of this is explained in the guidelines "How do I post a question effectively?". Here's some points on the code you've currently posted: You haven't used the strict or warnings pragmata. Unless you have a very good reason not to use these, you should include them in all your scripts. See the code I posted in my first reply to you. You've declared a number of lexical variables whose scope is the entire script but they're only used in `Functionparser()` — you should limit their scope to this subroutine. You should get into the habit of using the 3-argument form of open with a lexical filehandle. Using a package filehandle (such as `FILE2` used here) can cause issues if the same name is used anywhere else in your script; a lexical filehandle (e.g. `$fh`) would have its scope limited to `Functionparser()` — another `$fh` used elsewhere in your script would be a completely different filehandle. Using `$ARGV[0]` in `Functionparser()` limits that subroutine. At some later time you may want to parse multiple files. Use something like "`my ($source_code) = @_;`" as the first line in that subroutine: it will then be more generalised and you can call it as many times as you want allowing multiple files to be parsed. Consider using the autodie pragma or using more descriptive error output than "`die $!`". The former only requires a single "`use autodie;`" near the start of your code; the latter is a lot more work and error-prone. I usually choose `autodie` unless I have a very good reason not to. You seem to understand what '`$.`' does (you've used "`$func_start = $.`") but you've also introduced `$linenum` to provide identical functionality (which requires additional processing, i.e. "`$linenum++`"). Given that '`$.`' is already available, it would be easier to just use that. Also note that you don't use `$func_start` after assigning a value to it. You have retained the original regex which was the basis of your OP question and that you said didn't work. Why? That regex contains two captures: you ignore the second (i.e. `$2`). Your use of next looks wrong. If the `if` block (with the `next`) is entered, the remaining code in the `while` loop will not be executed; if that `if` block is not entered, the remaining code in the `while` loop will still not be executed as all conditions checking `$sw_test` will be false. Those parts of the `while` loop may be executed on subsequent iterations; however, that looks rather flakey to me. Perhaps when you provide sample input, your intent may become clearer. -- Ken	[reply] [d/l] [select]
Re^4: C/C++ function parsing by ssriganesh (Initiate) on Jan 15, 2014 at 06:55 UTC
Re: C/C++ function parsing by educated_foo (Vicar) on Jan 13, 2014 at 16:56 UTC
What you want will depend on what you mean by "parsing". If you want to actually pull out the separate parameters, and especially if you need to handle templates, the problem gets complicated. I'd look into just pulling code out of Inline::CPP or SWIG, since they can both handle fairly hairy declarations. If you just need to recognize the declarations in a file, and don't care about templates, you can get away with something much simpler along the lines of what you already have, e.g. `/^([\w:]+)$([^)]*)$/s`	[reply] [d/l]

Replies are listed 'Best First'.
Re: C/C++ function parsing by AnomalousMonk (Archbishop) on Jan 12, 2014 at 21:24 UTC
The regex `/([:\w]+)\((.\))/` matches against `$_` the default scalar. So, what's in `$_` ? If it's just a single line from the source file, it will be tricky to parse a multi-line declaration. If it's the entire file (the 'slurped' file), remember that the `.` (dot) regex operator matches everything except a newline* unless the regex is used with the `/s` "dot matches all" regex modifier. See Modifiers in perlre.	[reply] [d/l] [select]
Re: C/C++ function parsing by kcott (Archbishop) on Jan 13, 2014 at 04:39 UTC
G'day ssriganesh, Welcome to the monastery. To match the characters between the parentheses, I'd use '`[^)]`' which matches any character except the closing parenthesis because '`.`' matches any character including a closing parenthesis. Working with the limited example input you've provided, here's how I might have coded this. I've included additional code to join a captured multi-line: not stated, but that may be what you want. `#!/usr/bin/env perl -l use strict; use warnings; my $f1 = 'static gboolean g::ber_read(wtap wth, int err)'; my $f2 = 'static void ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size)'; for ($f1, $f2) { print "Function: $_"; /([:\w]+\([^)]\))/m; print "Captured: $1"; # if you want multi-line reduced to single-line: (my $no_newlines = $1) =~ y/\n/ /; print "Lines joined: $no_newlines"; }` [download] Output: `Function: static gboolean g::ber_read(wtap wth, int err) Captured: g::ber_read(wtap wth, int err) Lines joined: g::ber_read(wtap wth, int err) Function: static void ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size) Captured: ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size) Lines joined: ber_set_pkthdr(struct wtap_pkthdr phdr, int packet_size +)` [download] -- Ken	[reply] [d/l] [select]
Re^2: C/C++ function parsing by ssriganesh (Initiate) on Jan 13, 2014 at 08:14 UTC
HI Ken, Thanks for the quick response. The solution helped me only little, but I have learnt how to exclude some characters :).. The code which I have written for parsing is as below. May be with this you could help me out in a better way.. my @matches2; my $sw_test; my $func_name; my $func_start; my $func_full; my $print; my $count=0; Functionparser(); sub FunctionParser { my $linenum =0; open FILE2, $ARGV[0] or die $!; while(<FILE2>) { $linenum++; if(!$sw_test && (/([:\w]+)\((.*\))/) ) { print "first\n"; $func_name=$1; $func_start=$.; $func_full= $_; $sw_test=1; print "$linenum) $_\n"; $linenum1 = $linenum; $print=1; next; } $count++ if $sw_test && /\{/; $count-- if $sw_test && /\}/; if($sw_test && $count==0) { if($print) { push(@matches2, { 'Start' => $linenum1, 'End' => $line +num, 'FuncName' => $func_full}); $print=0; } $sw_test=0; } } close FILE2; } [download]	[reply] [d/l]
Re^3: C/C++ function parsing by ImJustAFriend (Scribe) on Jan 13, 2014 at 12:56 UTC
I would make 2 suggestions here. First, at the end of your regex add the "/s" modifier to make the regex look at multiline output as single line. Like so: `if(!$sw_test && (/([:\w]+)\((.*\))/s) )` [download] Second, google "The Regex Coach", download, install, and use it. I am no expert at Regex at all, so this software has saved my sanity time and again! It's not perl specific, but it always works in my code. Basically, you can input your data (one or more lines) you want to regex against in the 2nd window in the GUI, then use the top window to test your regex in real time (you enter everything you would ordinarily enter between the slashes). When I run your examples through TRC, your regex code works fine with the addition of "/s" as noted above. ImJustAFriend	[reply] [d/l]
Re^3: C/C++ function parsing by kcott (Archbishop) on Jan 13, 2014 at 13:41 UTC
"May be with this you could help me out in a better way." You'll need to specify what you want help with; an explanation of what you mean by "a better way" would also be useful. You should provide: some sample, representative input; actual and expected output (as well as an indication of where you're encountering difficulties getiing the actual output to match the expected output); any error and warning messages; along with anything else that helps to describe the problems you are encountering. All of this is explained in the guidelines "How do I post a question effectively?". Here's some points on the code you've currently posted: You haven't used the strict or warnings pragmata. Unless you have a very good reason not to use these, you should include them in all your scripts. See the code I posted in my first reply to you. You've declared a number of lexical variables whose scope is the entire script but they're only used in `Functionparser()` — you should limit their scope to this subroutine. You should get into the habit of using the 3-argument form of open with a lexical filehandle. Using a package filehandle (such as `FILE2` used here) can cause issues if the same name is used anywhere else in your script; a lexical filehandle (e.g. `$fh`) would have its scope limited to `Functionparser()` — another `$fh` used elsewhere in your script would be a completely different filehandle. Using `$ARGV[0]` in `Functionparser()` limits that subroutine. At some later time you may want to parse multiple files. Use something like "`my ($source_code) = @_;`" as the first line in that subroutine: it will then be more generalised and you can call it as many times as you want allowing multiple files to be parsed. Consider using the autodie pragma or using more descriptive error output than "`die $!`". The former only requires a single "`use autodie;`" near the start of your code; the latter is a lot more work and error-prone. I usually choose `autodie` unless I have a very good reason not to. You seem to understand what '`$.`' does (you've used "`$func_start = $.`") but you've also introduced `$linenum` to provide identical functionality (which requires additional processing, i.e. "`$linenum++`"). Given that '`$.`' is already available, it would be easier to just use that. Also note that you don't use `$func_start` after assigning a value to it. You have retained the original regex which was the basis of your OP question and that you said didn't work. Why? That regex contains two captures: you ignore the second (i.e. `$2`). Your use of next looks wrong. If the `if` block (with the `next`) is entered, the remaining code in the `while` loop will not be executed; if that `if` block is not entered, the remaining code in the `while` loop will still not be executed as all conditions checking `$sw_test` will be false. Those parts of the `while` loop may be executed on subsequent iterations; however, that looks rather flakey to me. Perhaps when you provide sample input, your intent may become clearer. -- Ken	[reply] [d/l] [select]
Re^4: C/C++ function parsing by ssriganesh (Initiate) on Jan 15, 2014 at 06:55 UTC
Re: C/C++ function parsing by educated_foo (Vicar) on Jan 13, 2014 at 16:56 UTC
What you want will depend on what you mean by "parsing". If you want to actually pull out the separate parameters, and especially if you need to handle templates, the problem gets complicated. I'd look into just pulling code out of Inline::CPP or SWIG, since they can both handle fairly hairy declarations. If you just need to recognize the declarations in a file, and don't care about templates, you can get away with something much simpler along the lines of what you already have, e.g. `/^([\w:]+)\(([^)]*)\)/s`	[reply] [d/l]