in reply to Re: Regex for stripping variable names from a JavaScript file
in thread Regex for stripping variable names from a JavaScript file

Okay, so I have tried using Parse::RecDescent and built a small parser for var statements... I'm having only one problem...

use Parse::RecDescent; $grammar = q { varStatement: 'var' statements endofvar statements: <leftop: statement comma statement> comma_values: <leftop: assignvalue comma assignvalue> statement: var_name (operator assignvalue)(?) assignvalue: equality | escapedRegex | escapedQuote | array_declaration | numeric_value | array_value | object_value | var_name array_declaration: 'new Array(' comma_values ')' array_value: array_name '[' integer ']' equality: '(' assignvalue equality_operator assignvalue + ')' var_name: /\w+/ { $return = "$item[1 +]" } array_name: /\w+/ object_value: /[A-Za-z0-9_.]+/ numeric_value: real_number | integer integer: /\d+/ real_number: /\d+\.?\d*/ escapedRegex: '__REGEX__' escapedQuote: '__QUOTE__' operator: '=' equality_operator: '===' | '==' | '!=' endofvar: ';' comma: ',' }; print "\n\n"; $parser = new Parse::RecDescent ($grammar) or die "*** Bad grammar!\n" +; foreach my $localDeclaredVar (@localDeclaredVars) { print "$localDeclaredVar\n"; my $test = $parser->varStatement($localDeclaredVar) or print "*** +Bad text!!!\n"; print "==>$test\n"; }

How do we grab the matched var_name? My goal is to match each variable name that was matched... but I've read the FAQ as much as I could handle and cannot determine that small fact....

The Input

var myTest1 = 1; var myTest2 = 2, myTest3 = 3, myTest4; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; var myTest7 =__REGEX__; var myTest8 = myTest5.x; var myTest9 = myTest[0], myTest10 = myTest[0]; var myTest11 = (myTest1 == myTest2); var myTest12 = (myTest1 == myTest2), myTest13 = 2; var myTest14 = (myTest1 == myTest2), myTest15; var myTest16 = new Array(1, 2); var myTest17, myTest18;

My Output

var myTest1 = 1; ==>; var myTest2 = 2, myTest3 = 3, myTest4; ==>; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; ==>; var myTest7 =__REGEX__; ==>; var myTest8 = myTest5.x; ==>; var myTest9 = myTest[0], myTest10 = myTest[0]; ==>; var myTest11 = (myTest1 == myTest2); ==>; var myTest12 = (myTest1 == myTest2), myTest13 = 2; ==>; var myTest14 = (myTest1 == myTest2), myTest15; ==>; var myTest16 = new Array(1, 2); ==>; var myTest17, myTest18; ==>;

As you can see, all that I get is the darn semicolon - the string that was left after all matching was successful... this is of course not what I want... Does anyone know how to solve this?

Replies are listed 'Best First'.
Re(3): Regex for stripping variable names from a JavaScript file
by dmmiller2k (Chaplain) on Feb 26, 2002 at 14:49 UTC

    There are two problems here. First, your grammar is not quite right. And secondly, you aren't setting the return in your starting rule.

    I'm not an expert with Parse::RecDescent, or with constructing grammars for YACC, Bison, etc.(far from it, actually); but IMHO, you probably don't want the { $return = $item[1] } on the 'var_name:' rule.

    Instead, I think you want it on the 'statement:' AND 'varStatement:' rules (see below). Also, removing the 'comma:' rule and replacing its use with literal commas, prevents getting commas in the output (apologies for not using the lingo correctly). Here's my attempt:

    use Parse::RecDescent; my $grammar = q { varStatement: 'var' statements endofvar { $return = $item[2] + } statements: <leftop: statement ',' statement> statement: var_name (operator assignvalue)(?) { $return = + $item[1] } comma_values: <leftop: assignvalue ',' assignvalue> assignvalue: equality | escapedRegex | escapedQuote | array_declaration | numeric_value | array_value | object_value | var_name array_declaration: 'new Array(' comma_values ')' array_value: array_name '[' integer ']' equality: '(' assignvalue equality_operator assignvalue +')' var_name: /\w+/ array_name: /\w+/ object_value: /[A-Za-z0-9_.]+/ numeric_value: real_number | integer integer: /\d+/ real_number: /\d+\.?\d*/ escapedRegex: '__REGEX__' escapedQuote: '__QUOTE__' operator: '=' equality_operator: '===' | '==' | '!=' endofvar: ';' }; my @localDeclaredVars = <DATA>; chomp @localDeclaredVars; print "\n\n"; $parser = new Parse::RecDescent ($grammar) or die "*** Bad grammar!\n" +; foreach my $localDeclaredVar (@localDeclaredVars) { print "$localDeclaredVar\n"; my $test = $parser->varStatement($localDeclaredVar) or print "*** Ba +d text!!!\n"; if ( ref($test) eq 'ARRAY' ) { print "==> ( @$test )\n"; } else { print "==> $test\n"; } } __END__ var myTest1 = 1; var myTest2 = 2, myTest3 = 3, myTest4; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; var myTest7 =__REGEX__; var myTest8 = myTest5.x; var myTest9 = myTest[0], myTest10 = myTest[0]; var myTest11 = (myTest1 == myTest2); var myTest12 = (myTest1 == myTest2), myTest13 = 2; var myTest14 = (myTest1 == myTest2), myTest15; var myTest16 = new Array(1, 2); var myTest17, myTest18;

    and here is the output:

    var myTest1 = 1; ==> ( myTest1 ) var myTest2 = 2, myTest3 = 3, myTest4; ==> ( myTest2 myTest3 myTest4 ) var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; ==> ( myTest5 myTest6 ) var myTest7 =__REGEX__; ==> ( myTest7 ) var myTest8 = myTest5.x; ==> ( myTest8 ) var myTest9 = myTest[0], myTest10 = myTest[0]; ==> ( myTest9 myTest10 ) var myTest11 = (myTest1 == myTest2); ==> ( myTest11 ) var myTest12 = (myTest1 == myTest2), myTest13 = 2; ==> ( myTest12 myTest13 ) var myTest14 = (myTest1 == myTest2), myTest15; ==> ( myTest14 myTest15 ) var myTest16 = new Array(1, 2); ==> ( myTest16 ) var myTest17, myTest18; ==> ( myTest17 myTest18 )

    This still misses array_names and variables within parenthesized expressions, but hey, it's a step in the right direction, I suppose. How would I be helping you if I solved your whole problem for you? :) At least you now have a debuggable chuknk of code.

    dmm

    If you GIVE a man a fish you feed him for a day
    But,
    TEACH him to fish and you feed him for a lifetime

      Thank-you very much for leading me in the right direction... ++ to you. Just 2 days ago I didn't know anything about Parse:RecDescent... and now I wrote my first simple grammar to it... I don't know much about grammars, but I'm learning. That's why it's nice that someone can take a look at what I've done and give me pointers, etc.

      One thing I haven't figured out, but now have some working code to play with, are the $return value 'stuff' (I was just guessing/hacking at it). Also, what you've done here does exactly what I want, return just the variable names (which I will use somewhere else in my code)... I'm not sure what you mean by missing "array_names" and "variables within parenthesized expressions", so I think I'll have a look into that as well.

      I appreciate your time. What references do you use for this sort of thing? I've been reading the .pod files...

        My pleasure. Actually, this is my first time using Parse::RecDescent, although I've written small grammars using bison. I didn't mean to actually solve your whole problem for you, just point you in the right direction.

        By 'missing' I was referring to the fact that in the following output, myTest, myTest1 and myTest2 are not picked up:

        var myTest9 = myTest[0], myTest10 = myTest[0]; ==> ( myTest9 myTest10 ) var myTest11 = (myTest1 == myTest2); ==> ( myTest11 )