in reply to Regex for stripping variable names from a JavaScript file

I'm not sure I'd use a regex for this. Think about Parse::RecDescent or its ilk.

dmm

If you GIVE a man a fish you feed him for a day
But,
TEACH him to fish and you feed him for a lifetime
  • Comment on Re: Regex for stripping variable names from a JavaScript file

Replies are listed 'Best First'.
Re: Re: Regex for stripping variable names from a JavaScript file
by Incognito (Pilgrim) on Feb 25, 2002 at 20:51 UTC

    Okay, so I have tried using Parse::RecDescent and built a small parser for var statements... I'm having only one problem...

    use Parse::RecDescent; $grammar = q { varStatement: 'var' statements endofvar statements: <leftop: statement comma statement> comma_values: <leftop: assignvalue comma assignvalue> statement: var_name (operator assignvalue)(?) assignvalue: equality | escapedRegex | escapedQuote | array_declaration | numeric_value | array_value | object_value | var_name array_declaration: 'new Array(' comma_values ')' array_value: array_name '[' integer ']' equality: '(' assignvalue equality_operator assignvalue + ')' var_name: /\w+/ { $return = "$item[1 +]" } array_name: /\w+/ object_value: /[A-Za-z0-9_.]+/ numeric_value: real_number | integer integer: /\d+/ real_number: /\d+\.?\d*/ escapedRegex: '__REGEX__' escapedQuote: '__QUOTE__' operator: '=' equality_operator: '===' | '==' | '!=' endofvar: ';' comma: ',' }; print "\n\n"; $parser = new Parse::RecDescent ($grammar) or die "*** Bad grammar!\n" +; foreach my $localDeclaredVar (@localDeclaredVars) { print "$localDeclaredVar\n"; my $test = $parser->varStatement($localDeclaredVar) or print "*** +Bad text!!!\n"; print "==>$test\n"; }

    How do we grab the matched var_name? My goal is to match each variable name that was matched... but I've read the FAQ as much as I could handle and cannot determine that small fact....

    The Input

    var myTest1 = 1; var myTest2 = 2, myTest3 = 3, myTest4; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; var myTest7 =__REGEX__; var myTest8 = myTest5.x; var myTest9 = myTest[0], myTest10 = myTest[0]; var myTest11 = (myTest1 == myTest2); var myTest12 = (myTest1 == myTest2), myTest13 = 2; var myTest14 = (myTest1 == myTest2), myTest15; var myTest16 = new Array(1, 2); var myTest17, myTest18;

    My Output

    var myTest1 = 1; ==>; var myTest2 = 2, myTest3 = 3, myTest4; ==>; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; ==>; var myTest7 =__REGEX__; ==>; var myTest8 = myTest5.x; ==>; var myTest9 = myTest[0], myTest10 = myTest[0]; ==>; var myTest11 = (myTest1 == myTest2); ==>; var myTest12 = (myTest1 == myTest2), myTest13 = 2; ==>; var myTest14 = (myTest1 == myTest2), myTest15; ==>; var myTest16 = new Array(1, 2); ==>; var myTest17, myTest18; ==>;

    As you can see, all that I get is the darn semicolon - the string that was left after all matching was successful... this is of course not what I want... Does anyone know how to solve this?

      There are two problems here. First, your grammar is not quite right. And secondly, you aren't setting the return in your starting rule.

      I'm not an expert with Parse::RecDescent, or with constructing grammars for YACC, Bison, etc.(far from it, actually); but IMHO, you probably don't want the { $return = $item[1] } on the 'var_name:' rule.

      Instead, I think you want it on the 'statement:' AND 'varStatement:' rules (see below). Also, removing the 'comma:' rule and replacing its use with literal commas, prevents getting commas in the output (apologies for not using the lingo correctly). Here's my attempt:

      use Parse::RecDescent; my $grammar = q { varStatement: 'var' statements endofvar { $return = $item[2] + } statements: <leftop: statement ',' statement> statement: var_name (operator assignvalue)(?) { $return = + $item[1] } comma_values: <leftop: assignvalue ',' assignvalue> assignvalue: equality | escapedRegex | escapedQuote | array_declaration | numeric_value | array_value | object_value | var_name array_declaration: 'new Array(' comma_values ')' array_value: array_name '[' integer ']' equality: '(' assignvalue equality_operator assignvalue +')' var_name: /\w+/ array_name: /\w+/ object_value: /[A-Za-z0-9_.]+/ numeric_value: real_number | integer integer: /\d+/ real_number: /\d+\.?\d*/ escapedRegex: '__REGEX__' escapedQuote: '__QUOTE__' operator: '=' equality_operator: '===' | '==' | '!=' endofvar: ';' }; my @localDeclaredVars = <DATA>; chomp @localDeclaredVars; print "\n\n"; $parser = new Parse::RecDescent ($grammar) or die "*** Bad grammar!\n" +; foreach my $localDeclaredVar (@localDeclaredVars) { print "$localDeclaredVar\n"; my $test = $parser->varStatement($localDeclaredVar) or print "*** Ba +d text!!!\n"; if ( ref($test) eq 'ARRAY' ) { print "==> ( @$test )\n"; } else { print "==> $test\n"; } } __END__ var myTest1 = 1; var myTest2 = 2, myTest3 = 3, myTest4; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; var myTest7 =__REGEX__; var myTest8 = myTest5.x; var myTest9 = myTest[0], myTest10 = myTest[0]; var myTest11 = (myTest1 == myTest2); var myTest12 = (myTest1 == myTest2), myTest13 = 2; var myTest14 = (myTest1 == myTest2), myTest15; var myTest16 = new Array(1, 2); var myTest17, myTest18;

      and here is the output:

      var myTest1 = 1; ==> ( myTest1 ) var myTest2 = 2, myTest3 = 3, myTest4; ==> ( myTest2 myTest3 myTest4 ) var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; ==> ( myTest5 myTest6 ) var myTest7 =__REGEX__; ==> ( myTest7 ) var myTest8 = myTest5.x; ==> ( myTest8 ) var myTest9 = myTest[0], myTest10 = myTest[0]; ==> ( myTest9 myTest10 ) var myTest11 = (myTest1 == myTest2); ==> ( myTest11 ) var myTest12 = (myTest1 == myTest2), myTest13 = 2; ==> ( myTest12 myTest13 ) var myTest14 = (myTest1 == myTest2), myTest15; ==> ( myTest14 myTest15 ) var myTest16 = new Array(1, 2); ==> ( myTest16 ) var myTest17, myTest18; ==> ( myTest17 myTest18 )

      This still misses array_names and variables within parenthesized expressions, but hey, it's a step in the right direction, I suppose. How would I be helping you if I solved your whole problem for you? :) At least you now have a debuggable chuknk of code.

      dmm

      If you GIVE a man a fish you feed him for a day
      But,
      TEACH him to fish and you feed him for a lifetime

        Thank-you very much for leading me in the right direction... ++ to you. Just 2 days ago I didn't know anything about Parse:RecDescent... and now I wrote my first simple grammar to it... I don't know much about grammars, but I'm learning. That's why it's nice that someone can take a look at what I've done and give me pointers, etc.

        One thing I haven't figured out, but now have some working code to play with, are the $return value 'stuff' (I was just guessing/hacking at it). Also, what you've done here does exactly what I want, return just the variable names (which I will use somewhere else in my code)... I'm not sure what you mean by missing "array_names" and "variables within parenthesized expressions", so I think I'll have a look into that as well.

        I appreciate your time. What references do you use for this sort of thing? I've been reading the .pod files...