SlugMass has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to extract all variables in a list from some f77 declarations. Fortran can have nasty nested levels of parenthesis which is confusing. I tried a naive split but that did not work. Here is a simple attempt.
#!/usr/bin/perl -w # # Scenario: Given lines of f77 code, parse out the comma delimited var +iables. # use Data::Dumper; use Text::Balanced qw(extract_delimited extract_multiple extract_varia +ble); # A real application will read in f77 source, this is just a sample. my $string = "real*8 Eparams(0:maxParam),Emvm(0:3),MyArray(row,col),Yo +urArray(12,(j,k),m)"; # my @fields = extract_multiple($string, [ sub { extract_delimited($_[0],q{\,}) } ], undef, 1); print "Fields:",Dumper(\@fields); exit(0);
After I get the individual variables, with their array dimensions, I will use the information for generating new code.

Replies are listed 'Best First'.
Re: f77 variable list parsing
by Sandy (Curate) on Jun 28, 2004 at 20:59 UTC
    Since FORTRAN is my mother tongue (so to speak) I felt honour bound to make an attempt.

    NOTE: I am by no means fluent in regular expressions, but I think this will work.

    Provisos:
    (1) It assumes that the 'real' or 'integer' statements will be properly ignored, either by having no spaces or commas preceding it in the string, or by explicitly junking anything that looks like a declaration.
    (2) It assumes no continuation lines. (I leave that to you).
    (3) I haven't tested it thoroughly with spaces etc.

    Here it is.

    #!/usr/bin/perl -w # # Scenario: Given lines of f77 code, # parse out the comma delimited var+iables. # use Data::Dumper; # A real application will read in f77 source, # this is just a sample. my $string = "real*8 Eparams(0:maxParam),". "Emvm(0:3),". "YourArray(12,(j,k),m),". "MyArray(row,col)"; my $open = 0; my $close = 0; my @fields = (); my $a; # --- get rid or 'real*8' $string =~ /(\w+)(\*[0-9]*)?/gc; # --- keep looping until entire string is processed while (not $string =~ /\G\z/gc) { # --- get variable name if ((not $open) and ($string =~ /\G(?:\s*|,)(\w+)/gc)) { push @fields, $1; print "$1\t"; } # --- find opening brackets, and keep track of level elsif ($string =~ /\G([^\)]*\()/gc) { $a = $1; $open++; } # --- find closing brackets, and print info elsif ($open and ($string =~/\G\s*([^\(]*\))/gc)) { print "-> $a$1\n"; $open--; } } #print "Fields:",Dumper(\@fields); exit(0);
    Sandy
      "WATFOR Forever!" Anyway, I found an interesting pertinent link: http://marine.rutgers.edu/po/tools/perl.html I also played with your code and incorporated it into my mini project:
      sub getFortranVarNames { # # Purpose: Find all variables in a list of FORTRAN declarations. # Preserve any array dimension information as a string for # later processing. Subroutine is based on response by # "Sandy" on perlmonks.org # # Return array is further massaged for array dimensions. my ($string) = @_; my @fields=(); my $open = 0; my $close = 0; my $a; my $i = 0; # \z TRUE at end of string only while (not $string =~ /\G\z/gc) { # --- get variable name # \G TRUE at end-of-match position of prior m//g # ?: cluster only parenthesis, no capturing # gc - allow continued search after failed /g match if ((not $open) and ($string =~ /\G(?:\s*|,)(\w+)/gc)) { $fields[$i]{name}=$1; $fields[$i]{dims}=''; $opt_debug and print "Name: $1\t"; $i++; # --- find opening brackets, and keep track of level } elsif ($string =~ /\G([^\)]*\()/gc) { $a = $1; $open++; } # --- find closing brackets, and print info elsif (($open) and ($string =~/\G\s*([^\(]*\))/gc)) { $opt_debug and print "-> $a$1\n"; $fields[$i-1]{dims}="$a$1"; $open--; } } return (@fields); }
      When I get the whole thing done I may post it.
        WatFIV!!

        When I get the whole thing done I may post it.
        Go for it!