As an alternative to a full-blown parser, the
C::Scan
module does an extremely effective job of
extracting information from C source code
without relying on simple pattern matching.
It's based on the
Data::Flow
module, and does some really fast, accurate
scanning by (e.g.) replacing C strings
and comments with white space
(to avoid false matches within those constructs)
and
matching braces and parentheses
to zero in on what's a function definition/declaration
by syntactic position, not by regex matching.
It may not be as flexible as a full LALR parser,
but it already exists,
so you wouldn't have to
create your own C grammar
or retrofit an existing one to a parser.
I don't think it does everything you're talking about,
but it does enough that it would probably be
an effective starting point.
(There's some really slick, mind-expanding Perl
in both C::Scan and Data::Flow,
which isn't surprising seeing as how they
were originally written by Ilya Zakharevich.
They're both worth looking at for the
learning experience alone...)