in reply to Bash Parser

undef my %functions;
What's the point of using an explicite undef here? What's wrong with my %functions;?
my $regex = "\$file =~ s/\\\$$variable/$path/g;"; eval "$regex";
I don't get this. Why not just $file =~ s/\$$variable/$path/g;, or, if you're worried the substitution has a fatal syntax: eval {$file =~ s/\$$variable/$path/g;}? But you aren't checking the result of the eval, so you must be confident that it won't fail.
open FH2, "<$file";
Remember what one of the creators of Unix (Thompson?) said: not being able to open a file is not exceptional. Always check if opening a file succeeded.
if(/^[ \t]*source +([\w\-\/\.\_\$]+)/)
Hmmm. So, I can have tabs before the source keyword, but not after them? And I cannot surround the filenames with quotes? Or have non-word characters other than '-', '/', '.', '_' or '$' in them?
if($check_if_bash == 1 && $line_number == 1 && ! /bash/) { return }
A bash script doesn't have to start with a line saying "bash". Just like a Perl program doesn't have to start with a line saying "perl". And even if such a script starts with a she-bang line, it may says #!/usr/bin/sh, where /usr/bin/sh and /usr/bin/bash are links to the same file.
# Count comments and empty lines elsif(/(\%\%.*\%\%)/) {
What makes you think that %% ... %% on a single line is a comment block in bash? My bash manual mentions various meanings for %% depending on context, but none of them have anything to do with comments.

You also seem to assume bash programs use only newlines as statement separators, and all newlines separate statements. Neither statement is true.

You don't seem to deal with any of the many quoting and grouping mechanisms of a bash program. Nor does your program seem to be able to deal with here documents.

There's quite some duplication of code between 'match' and 'get_sources', but there are also differences. It's totally unclear to me why there need to be any differences.

It looks to me there's no guards against infinite recursion. If file 'A' contains source A, then your program will never stop by itself.

Replies are listed 'Best First'.
Re^2: Bash Parser
by mickep76 (Beadle) on Sep 30, 2009 at 11:30 UTC

    Using undef my %... is redundant as you say, really serves no purpose.

    eval {$file =~ s/\$$variable/$path/g;}

    I will add proper error checking to files, just didn't bother.

    You are correct about the substitution, just couldn't find a better approach to do it. Which you neatly just gave me :D.

    if(/^[ \t]*source +([\w\-\/\.\_\$]+)/)

    The regular expression was more adapted to what people use through testing than what they might use, I will add you suggestions.

    if($check_if_bash == 1 && $line_number == 1 && ! /bash/) { return }

    Was mainly to exclude scripts that are executable but not bash. If you have a good idea how to verify if a script is indeed bash when it lacks #!/usr/bin/bash id be greatful.

    You are right %%...%% isn't a comment it's usually a text block, the main reason to detect it was to avoid parsing it later. But it should not be included in comment statistics.

    match/get_sources have slightly different structure since I don't want the statistics from sourced files since the statistics is per script and sourced files are individually included in the end.

    There is as you say a possibility of a race condition I will fix this. Thanks

      %% - do you handle here docs as well and ignore embedded multiline strings containing perl or awk, for all styles of ', ", \-escaped or HERE-document?

      Identifying bash scripts:

      If the script's not being invoked by sourcing or an explicit bash filename: -T file AND -x file AND strings [\s/](ba)?sh or sh<versionstring> in the shebang line AND (to be safe) some specific bourne-shell style idiom like N>&N.

      Also consider asking file(1) for a guess.

      Definitely a lie, but still quite common:

      #!/bin/sh #!perl # -w -Sx: line count is one off eval 'exec perl -Sx $0 ${1:+"$@"}' if 0;

      Trying to distinguish between bourne-descending shell might be an additional challenge. Bash vs ksh93 vs dash vs pdksh vs zsh vs csh-considered-harmful vs ... .

      Homework for the astute reader: which of the above shells would be the most easy to detect and exclude?

      You're correct of course. But you also considered some of the zsh syntax? So it's a bit more complicated...
Re^2: Bash Parser
by mickep76 (Beadle) on Sep 30, 2009 at 09:59 UTC
    This script is not perfect and probably has more flaws than what you mentioned. It was a quick hack to solve a problem and works fairly well and is fairly accurate but by no accounts is it perfect. Thanks for the input, will try to address your points.
      add use autodie; and you're checking for open/close failure