Ace128 has asked for the wisdom of the Perl Monks concerning the following question:

Hey,

I'm working on a parser to parse at the moment .c files so I can add ND http://www.naturaldocs.org/documenting.html style commenting above the function.

That is, say I have:
static struct super_block* sfs_get_super(struct file_system_type *fst, + int flags, const char *devname, void *data) { struct super_block *sb = 0; struct mounter_data_t *mount_data = sfs_get_mounter_data(dat +a); HIGHPRINT("Calling get_sb_single\n"); sb = get_sb_single(fst, flags, mount_data, sfs_fill_super); HIGHPRINT("Returned from get_sb_single\n"); dealloc_mounter_data(mount_data); return sb; }
I wanna do so this becomes:
/* Function: sfs_get_super(struct file_system_type *fst, int flags, co +nst char *devname, void *data) * * Parameters: * fst - * flags - * devname - * data - * * Returns: * struct super_block* */ static struct super_block* sfs_get_super(struct file_system_type *fst, + int flags, const char *devname, void *data) { struct super_block *sb = 0; struct mounter_data_t *mount_data = sfs_get_mounter_data(dat +a); HIGHPRINT("Calling get_sb_single\n"); sb = get_sb_single(fst, flags, mount_data, sfs_fill_super); HIGHPRINT("Returned from get_sb_single\n"); dealloc_mounter_data(mount_data); return sb; }
That is, takes out the function and its arguments, and creates that skeleton documentation. I'm using Tie::File to read the file, and am currently halted on parsing the code. So far I've done this:
#!/usr/bin/perl # Script to add comments for Natural Doc. http://www.naturaldoc.org use Tie::File; use strict; use warnings; use Data::Dumper; my @FILE_ARRAY; tie @FILE_ARRAY, 'Tie::File', "searchfs.c", recsep => "\n" or die $!; my $found = 0; foreach (@FILE_ARRAY) { #if (/^(sub .+)/) { # Match .c function if (/^(?:([0-9_a-zA-Z*]+) +)?(?:([0-9_a-zA-Z*]+) +)?(?:([0-9_a-zA- +Z*]+) +)?([0-9_a-zA-Z*]+)\((.+?)\)/) { my ($a, $b, $c, $d, $e) = ("") x 5; if (defined($1)) { $a = $1; } if (defined($2)) { $b = $2; } if (defined($3)) { $c = $3; } if (defined($4)) { $d = $4; } if (defined($5)) { $e = $5; } print $a . " " . $b . " " . $c . " " . $d . " ARGS: " . $e . +"\n"; } }
This only works with parsing one row (aka static int sfs_fill_super(struct super_block *sb, void *data, int silent) ), but its a start. { is always on a new line after the function (good old C coding style apparently :) ). I got tips to use C::Scan, but that that lacks documentation, and it seem to need some external application named "cppstdin". (Im using Windows here). So, I was wondering if anyone here had some smart way to solve this problem. I know there are some bright people here! ;)

Later Im gonna add for parsing Perl scripts aswell, but that is alot easier.

Thanks,
Ace

Replies are listed 'Best First'.
Re: Parsing C Source File Functions.
by Khen1950fx (Canon) on Oct 21, 2006 at 14:57 UTC
    I believe "cppstdin" refers to cpp, a C preprocessor, which is a part of gcc. If you want to give C::Scan a try, you can get the Windows version at http://www.mingw.org/
Re: Parsing C Source File Functions.
by davidrw (Prior) on Oct 21, 2006 at 16:07 UTC
    i think the inner block would be much clearer and short as something like this (note also that $a and $b should be avoided because of their use by sort):
    foreach (@FILE_ARRAY){ next unless /^(?:([0-9_a-zA-Z*]+) +)?(?:([0-9_a-zA-Z*]+) +)?(?:([0-9 +_a-zA-Z*]+) +)?([0-9_a-zA-Z*]+)\((.+?)\)/; my ($args, @definition) = map { defined($_)?$_:'' } $5, $1, $2, $3, +$3; print join(" ", @definition) . " ARGS: $args\n"; }

    This only works with parsing one row
    To workaround this, you need to slurp in either the whole file, or maybe just one function (maybe defined by /^}$/ ?? not going to be 100%) at a time.. Then your regex can use the /s modifier and span multiple lines and you'll want to change all the spaces in the regex to \s+

    Later Im gonna add for parsing Perl scripts aswell, but that is alot easier.
    easier? really? There is PPI, but there's set way to pass parameters into a function ... e.g.
    sub foo { my $self = shift; my $p = {@_}; my $name = $p->{name} or die; my $value = $p->{value} || 123; }
    But there's many other ways (Super Search for threads on passing parameters) to do it, including mucking directly with @_
      Forgot to mention that for the Perl part I may not care about the arguments the same way... unless using this PPI is smart and I can get some interesting data from it which I can use with the ND. :)
Re: Parsing C Source File Functions.
by graff (Chancellor) on Oct 21, 2006 at 16:33 UTC
    Um, I suppose your plan could work, if the C code you're handling has been formatted in strict accordance with a specific coding style, and doesn't contain any traps like multi-line quoted strings containing lines that resemble function prototypes.

    But if there are portions of code that have been commented out by bracketing a region between "/*" and "*/", and the region happens to contain (strings resembling) function declarations, then adding your extra comment lines is likely to bolix things. (I haven't checked on this in a while, but I recall not being able to rely on whether embedded "/* ... */" comments would be handled correctly by every C compiler.)

    To take this sort of task seriously, regex matching won't really do it -- you have to parse the text character by character, so that at any given point, you know what sort of content you're dealing with (quoted string, comment string, function body, "#define" directive, etc), and you know how to interpret each character as you get to it (e.g. whether it was preceded by "\").

    Still, if I could assume that some C source code really has a well-behaved format, and "false-alarm" matches of function declarations won't happen, then a regex like this might do:

    my $csrc; { local $/; # slurp the source code $csrc = <>; # from stdin or $ARGV[0] } while ( $csrc =~ /\n((?:\w+\*?\s+)+) (\w+\s*) \( (.*?) \) \s* \{ /gsx +) { my ( $functype, $funcname, $funcarg ) = ( $1, $2, $3 ); my @funcargs = split /,\s*/, $funcarg; print "found function def:\n type=$functype\n name=$funcname\n arg +s=\n "; print join( "\n ", @funcargs ), "\n"; # do other stuff with these strings... }
    (update: changed the $funcarg split to allow 0-or-more whitespace)

    Slurping the file like that makes it easier to handle the multi-line function declarations, but then makes it just a little harder to insert the extra comment strings correctly (not impossible, certainly).

    Maybe what you really want is to enhance whatever editor you normally use for writing C code, by adding a macro or function of some sort that will take a highlighted region, copy/paste it, and reformat the upper copy as a comment block. (I'm sure folks have done this numerous times with emacs/elisp.)

    What you are doing is going to require manual editing anyway -- someone is supposed to type in explanations for the paramaters, etc, or else the whole exercise is pointless, right? -- so the right tool for this job is a macro in a text editor, not Perl (unless your editor lets you declare macros with embedded perl scripting).

    Later Im gonna add for parsing Perl scripts aswell, but that is alot easier.

    Heheh, yeah right... NOT (unless your perl source code holds to even more stringent style constraints that your C code). But good luck with that anyway.

      For vim we made:
      map <F8> 0f)%b"2yef)b"3yeO/* Function: <esc>"2po/<esc>O<esc>a Paramete +rs: <esc>"3p<esc>
      Not entirely perfect I think... :)

      / Ace
Re: Parsing C Source File Functions.
by Ace128 (Hermit) on Oct 22, 2006 at 23:52 UTC
    Giving up isn't my deal :)
    #!/usr/bin/perl # Script to add comments for Natural Doc. http://www.naturaldoc.org use strict; use warnings; use Tie::File; use Data::Dumper; if (@ARGV < 1) { print "Usage: app file"; exit(0); } my $file; open $file, $ARGV[0] or die $!; my $data = join("", <$file>); close $file; my $data_with_comment = ""; my @matches; while ( $data =~ /\n((?:\w+\*?\*?\s+)+) (\*?\w+\s*) \s* \((.*?) \) (\s +*\{)/gsx ) { push(@matches, { 'fullmatch' => $&, 'functype' => $1, 'funcname' = +> $2, 'funcargs' => $3 }); } foreach (@matches) { my $fullmatch = $_->{'fullmatch'}; my $functype = $_->{'functype'}; my $funcname = $_->{'funcname'}; my $funcargs = $_->{'funcargs'}; my $comment = ""; my @funcargs = split /,\s*/, $funcargs; my $args = join( ", ", @funcargs ); my $args2 = join( "\n * ", @funcargs ); $comment .= qq { /* Function: $funcname($args) * * * * Arguments: * $args2 * * Returns: * $functype * */ }; $data =~ s/\Q$fullmatch\E/$comment$fullmatch/; } print $data;
    This works for me atleast.
      How can this be exteneded for parsing the CPP functions : static gboolean g::ber_read(wtap *wth, int *err, gchar **err_info, gint64 *data_offset)

        Change the regex to accept the namespace prefix of the function name.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Parsing C Source File Functions.
by Ace128 (Hermit) on Oct 22, 2006 at 21:14 UTC
    I've managed to do the following:
    #!/usr/bin/perl # Script to add comments for Natural Doc. http://www.naturaldoc.org use strict; use warnings; use Tie::File; use Data::Dumper; my $file; open $file, "searchfs.c" or die $!; my $data = join("", <$file>); close $file; my $data_with_comment = ""; my $rest = ""; while ( $data =~ /(.*?)\n((?:\w+\*?\*?\s+)+)(\*?\w+\s*)\s*\((.*?)\)(\s +*\{)/gsx ) { $rest = $data; #print $rest; #sleep 4; my $code_before = $1; my $code_after = $5; $data_with_comment .= $code_before; my ( $functype, $funcname, $funcarg ) = ( $2, $3, $4 ); my @funcargs = split /,\s*/, $funcarg; #print "found function def:\n type=$functype\n name=$funcname\n ar +gs="; #print join( "\n ", @funcargs ), "\n"; my $args = join( ", ", @funcargs ); my $args2 = join( "\n * ", @funcargs ); #$data_with_comment .= "\n"; $data_with_comment .= qq{ /* Function: $funcname($args) * * * * Arguments: * $args2 * * Returns: * $functype * */ }; #print join( "\n ", @funcargs ), "\n"; $data_with_comment .= $2 . $3 . "(" . $4 . ")"; $data_with_comment .= $code_after; } print $data_with_comment;
    which works nice exept that after the last match, I dont get the rest of the code! Is there a way to save the rest and add it last somehow? Or anything else that solves this?

    Thanks,
    Ace