Incognito has asked for the wisdom of the Perl Monks concerning the following question:

So far, I have learned a lot from this site and playing around with grammars using Parse::RecDescent, but I haven't really moved forward with the problem at hand.

The Real Problem(s)

I have a JavaScript file, which may or may not have global vars at the top, middle and bottom of the page. This file also has function declarations, which may have local vars defined in them as well.

  1. I wouldn't mind getting each of the global vars stored into an array (I'm talking storing both the var name and it's assignment), so I can peruse/iterate through all global vars in a given file.
  2. I want an array returned that contains each function and its contents, not just the function names like I've already done. The grammar I've written goes through the entire file, ignoring stuff that isn't a function and is very slow on moderately sized files. This array would let me say, jump to the 5th element in it, and either print it out, or do something to that function contents.

    When I break this problem down, it's basically, (a) skip any code until you reach a function (b) return everything in the function (including the function name and braces and stuff) and (c) once out of the function, ignore everything until we get another function.

    For me to return the entire function string contents, I'll have to make modifications to the productions like stuff_we_ignore, paren_statement and bracket_statement, so that they return stuff... but that means they'll return stuff when not found within a function. I can't seem to figure out how to write a grammar that returns stuff when in a function, and nothing otherwise...

Grammar to Parse a JavaScript file and return array of Function Names

I'm including the code I've written that returns the array of function names. There's got to be something simple that I'm missing that will let me do what I want...

#!/usr/bin/perl use strict; # Enforces safer, clearer code. use warnings; # Detects common programming errors use Time::HiRes qw(gettimeofday); use Parse::RecDescent; #use Data::Dumper; #--------------------------------------------------------------------- +- # Build the grammar. #--------------------------------------------------------------------- +- my ($grammar); my ($startCompile,$startCompile2) = gettimeofday; $startCompile += ($startCompile2/1000000); $grammar = q { statement: ( function_method (';')(?) { $return = $item +[1]; } | brace_statement (';')(?) { $return = $it +em[1]; } | stuff_we_ignore (';')(?) { $return = $it +em[1]; } )(s) function_method: 'function' identifier paren_statement brace_statement { $return = $item[2]; } brace_statement: '\{' statement '\}' { $return = $item[2]; } paren_statement: '(' statement ')' bracket_statement: '[' statement ']' stuff_we_ignore: ( paren_statement | bracket_statement | identifier | punctuators )(s?) { $return = ""; } identifier: /\w+/ punctuators: /[><=!~:&^%,\?\|\+\-\*\/\.]+/ }; #--------------------------------------------------------------------- +- # Grab the data and parse. #--------------------------------------------------------------------- +- my @localDeclaredVars = <DATA>; my $localDeclaredVar = join ' ', @localDeclaredVars; my $parser = new Parse::RecDescent ($grammar) or die "*** Bad grammar! +\n"; my $i = 1; my ($endCompile,$endCompile2) = gettimeofday; $endCompile += ($endCompile2/1000000); my $refParsedValues = $parser->statement($localDeclaredVar) || print " +*** $localDeclaredVar\n"; my ($parseEnd,$parseEnd2) = gettimeofday; $parseEnd += ($parseEnd2/1000000); #--------------------------------------------------------------------- +- # Flatten the array and print contents. #--------------------------------------------------------------------- +- #print Dumper(\@$refParsedValues); my (@flatarray); sub flatten_recurse { # Thanks Anomo map ref eq 'ARRAY' ? flatten_recurse(@$_) : $_, grep defined && le +ngth, @_; } @flatarray = flatten_recurse ($refParsedValues); print join "\n", @flatarray, "\n"; my ($appEnd,$appEnd2) = gettimeofday; $appEnd += ($appEnd2/1000000); print '-'x72 . "\n"; print "Compile Time: " . (sprintf "%2.3f", ($endCompile - $startCompil +e)) . " seconds\n"; print "Parse Time: " . (sprintf "%2.3f", ($parseEnd - $endCompile)) +. " seconds\n"; print "Flatten Time: " . (sprintf "%2.3f", ($appEnd - $parseEnd)) . " +seconds\n"; print "__________________________\n"; print "Total Time: " . (sprintf "%2.3f", ($appEnd - $startCompile)) +. " seconds\n"; #--------------------------------------------------------------------- +- # End of program. #--------------------------------------------------------------------- +- __END__ function functStart () {}; var g1, g2 = __QUOTE__; var g3 = 10000000; if (g1) { var XXXXXXX = __QUOTE__; } if ( ! defaultCookieCrumbNav ) { cookieCrumbNavBarHTML = __QUOTE__ ; } + else { function funct1 () { }; var xxx = __QUOTE__ ; } if (true == false) { alert(var1); } if (1) { if (1) { function funct2 (X) { x = funct3 (1,2); }; function +funct3 () { alert(1); } } } function funct4 (a,b) { alert (1,2,3,4); return (a + b); } function funct5 () { var aaa = 1; } var g4; var g7 = __QUOTE__; function funct6 () { var b = __REGEX__; c = __REGEX__; if (test333()) +{ return true; } } function funct7 () { var a = 111; } alert ( 3 ); funct5 ( funct6 ( funct2 () ) ); function functEnd () {}

Output

functStart funct1 funct2 funct3 funct4 funct5 funct6 funct7 functEnd ---------------------------------------------------------------------- +-- Compile Time: 0.270 seconds Parse Time: 0.691 seconds Flatten Time: 0.000 seconds __________________________ Total Time: 0.961 seconds

Wishful Output (feels like in my dreams)

function functStart () {}; function funct2 (X) { x = funct3 (1,2); }; function funct3 () { alert(1); } function funct4 (a,b) { alert (1,2,3,4); return (a + b); } function funct5 () { var aaa = 1; } function funct6 () { var b = __REGEX__; c = __REGEX__; if (test333()) +{ return true; } } function funct7 () { var a = 111; } function functEnd () {} ---------------------------------------------------------------------- +-- Compile Time: 0.000 seconds Parse Time: 0.000 seconds Flatten Time: 0.000 seconds __________________________ Total Time: 0.000 seconds

2002-03-13 Edit by Corion : Added READMORE tag

Replies are listed 'Best First'.
Re: Rec::Descent Woes - Parsing JavaScript Files and Other Issues
by hossman (Prior) on Mar 13, 2002 at 05:21 UTC
    I've never acctually used Parse::RecDescent, but I've written parses like this before. The first thing that jumps out at me is your handler for functions...
    function_method: 'function' identifier paren_statement brace { $return = $item[2]; }
    Assuming this does what I think it does, you're throwing away the function signature and body by only returning $item[2]. You might try this instead...
    function_method: 'function' identifier paren_statement brace { $return = \@item; }
    with your existing flatten_recurse method, I think that will do what you want (allthough you'll probably be mising the parens and braces since it looks like you you throw those out when you parse paren_statement and brace_statement)

    The power of writting Parsers like this is that you can build your data structures as you parse the file. By flattening out the results from your parser, you're throwing away a lot of the functionality you could have. Considering your goal, you might want to make all of your handlers just return \@item, that should cause your parser to build "a complete parse tree" of your javascript files. Then you can walk it and do whatever you want with it.

    Update: Don't try \@item. On a whim I installed Parse::RecDescent to see if I was right, and ran into a deep recursion problem. Evidently @item is a more complicated array then I thought (i figured it was just the individual tokens ... now i'm curios to go read the docs and understand how this module works).
    Bottom line: dmmiller2k is probably right, your grammer seems very specific in some ways and very general in others -- you should try to make it more specific about the things you care about, and more general about the things you don't.

Re: Rec::Descent Woes - Parsing JavaScript Files and Other Issues
by jryan (Vicar) on Mar 13, 2002 at 03:46 UTC

    I'm not sure how to help you with your Parse::RecDescent problem (I've never used it), but here is a pure-regex solution that I whipped up (half to help, half for fun) for your original problem:

    my $data = join('',<DATA>); my $quoted_string = qr/ " (?: (?: [^"]* ) | (?: (?<= \\ ) " ) )* " /x; my $balenced_brackets = qr/ \{ (?: (?> [^{}"] *) | (??{$quoted_string}) | (??{$balenced_brackets}) )* \} /x; my $function_header = qr/ function \s+ \w+ \s* \( [^)]* \) \s* /x; my @matches = $data =~ /($function_header $balenced_brackets)/gx; print join("\n\n",@matches);

    It worked with the sample code that you provided, but I have a feeling it will blow up when presented when it encounters something whacky (an unclosed brace, a brace inside quotes, etc). Consider it a preliminary solution :)

    Update I added a bit of robustness to consider the case of the unclosed brace, but it now needs  re 'eval'. I also added a bit of explanation... Its still going to blow up on a brace inside quotes, however...

    Update: Update: With some better structure, it can now handle brackets in quotes with no problem.

Re: Rec::Descent Woes - Parsing JavaScript Files and Other Issues
by Anonymous Monk on Mar 13, 2002 at 03:07 UTC
    what you need to do now is build a data structure from your pattern. after you have it, it's easier to display it in whatever format.
Re: Rec::Descent Woes - Parsing JavaScript Files and Other Issues
by dmmiller2k (Chaplain) on Mar 13, 2002 at 17:03 UTC

    Just looking at your grammar, it seems to me that statement and function_method are infinitely recursive with respect to brace_statement, paren_statement and bracket_statement (which themselves are defined in terms of statement).

    Since you are not trying to actually parse the contents of functions, but simply return the contents wholesale (in order to associate it with the function name in a hash), why not make it simple for yourself?

    For each function, there are two things you need: the contents of the parentheses and those of the curly braces, either of which may be nonexistent (empty). In order to recognize the end of a particular set of parens (or braces), you need to be able to identify any embedded matching pairs (or parens, braces or square brackets), which is why you should identify symbols, quoted strings, etc. as well.

    Okay, you have these basic building blocks. Now you need to combine them in ways in which they might actually appear within programs without overcomplicating your task.

    For instance, I think in your grammar, the statement rule is over-used (incorrectly). I'm uncertain, but I don't think the parameter list of a function can consist of arbitrarily complex statements; likewise, for the contents of expressions within square brackets. Perhaps you should describe expression in a rule. Then, your definition of statement could be comprised of several different kinds of expressions (e.g., assignment).

    It would be interesting to know where you are taking this; are you building a Javascript debugger in Perl?

    dmm

    If you GIVE a man a fish you feed him for a day
    But,
    TEACH him to fish and you feed him for a lifetime
Re: Rec::Descent Woes - Parsing JavaScript Files and Other Issues
by I0 (Priest) on Jun 26, 2002 at 23:59 UTC
    $_ = join'',<DATA>; (my $re=$_) =~ s/(({)|(})|.)/${['(','']}[!$2]\Q$1\E${[')','']}[!$3]/gs +; $re = join'|',map quotemeta,eval{/$re/}; warn $@ if $@ =~ /unmatched/; print join "\n",/(\bfunction\b[^()]*\([^()]*\)\s*(?:$re))/g,''; __DATA__ function functStart () {}; var g1, g2 = __QUOTE__; var g3 = 10000000; if (g1) { var XXXXXXX = __QUOTE__; } if ( ! defaultCookieCrumbNav ) { cookieCrumbNavBarHTML = __QUOTE__ ; +} else { function funct1 () { }; var xxx = __QUOTE__ ; } if (true == false) { alert(var1); } if (1) { if (1) { function funct2 (X) { x = funct3 (1,2); }; function + funct3 () { alert(1); } } } function funct4 (a,b) { alert (1,2,3,4); return (a + b); } function funct5 () { var aaa = 1; } var g4; var g7 = __QUOTE__; function funct6 () { var b = __REGEX__; c = __REGEX__; if (test333()) + { return true; } } function funct7 () { var a = 111; } alert ( 3 ); funct5 ( funct6 ( funct2 () ) ); function functEnd () {}