Incognito has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to come up with a regex that will grab all variable names in the following JavaScript function/file:
function test (aaaaa) { var myTest1 = 1; var myTest2 = 2, myTest3 = 3, myTest4; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; var myTest7 =__REGEX__; var myTest8 = myTest5.x; var myTest9 = myTest[0], myTest10 = myTest[0]; var myTest11 = (myTest1 == myTest2); var myTest12 = (myTest1 == myTest2), myTest13 = 2; var myTest14 = (myTest1 == myTest2), myTest15; var myTest16 = new Array(1, 2); var myTest17 = new(blah), myTest18 = new(blah,blah2), myTest19; }
The __QUOTE__ and __REGEX__ strings are basically quoted strings or regexes which were previously parsed, so we don't ever have to worry about quotes or special chars in this file.
Here's the regex (with some debug code) I developed so far (where $strInput contains the entire function and its contents):
my (@localDeclaredVars) = ($strInput =~ m/\bvar\s+([^;]+)/g); foreach my $localDeclaredVar (@localDeclaredVars) { if ($localDeclaredVar =~ m/,/) { # We have multiple variables declared in one line. @localDeclaredSubVars = $localDeclaredVar =~ m{ ( # Grab the variable name \w+ ) \s* (?: # Suck up any possible values of that variable = \s* (?: # A variable with parentheses and possible comma (?: \( (?: \\. [^\)\\]* )* \) ) | (?: # A straight variable or value [^,]+ ) )* )? ,? }gx; print "\n$localDeclaredVar\n"; foreach my $localDeclaredSubVar (@localDeclaredSubVars) { if ($localDeclaredSubVar =~ m/=/) { ($localDeclaredSubVar) = ($localDeclaredSubVar =~ m/\s +*([^= ]+)\s*=/); } print " $localDeclaredSubVar\n"; push (@localVariables, $localDeclaredSubVar) if ($localDec +laredSubVar); } } else { # We have a single variable declaration. print "\n$localDeclaredVar\n"; if ($localDeclaredVar =~ m/=/) { ($localDeclaredVar) = ($localDeclaredVar =~ m/\s*([^= ]+)\ +s*=/); } print " $localDeclaredVar\n"; push (@localVariables, $localDeclaredVar) if ($localDeclaredVa +r); } }
This is the output I get... As you can see, my regular expression is having a hard time sucking up brackets with commas in them. It's grabbing most variable names fine, but having trouble when there's an Array with commas in the parentheses.
myTest1 = 1 myTest1 myTest2 = 2, myTest3 = 3, myTest4 myTest2 myTest3 myTest4 myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6 myTest5 __QUOTE__ myTest6 myTest7 =__REGEX__ myTest7 myTest8 = myTest5.x myTest8 myTest9 = myTest[0], myTest10 = myTest[0] myTest9 myTest10 myTest11 = (myTest1 == myTest2) myTest11 myTest12 = (myTest1 == myTest2), myTest13 = 2 myTest12 myTest13 myTest14 = (myTest1 == myTest2), myTest15 myTest14 myTest15 myTest16 = new Array(1, 2) myTest16 2 myTest17 = new(blah), myTest18 = new(blah,blah2), myTest19 myTest17 myTest18 blah2 myTest19
Can someone help me fix this regex so that it grabs all variable names, from myTest1 through to myTest19? Also, if there's a way I can do the regex without having to do the my (@localDeclaredVars) = ($strInput =~ m/\bvar\s+([^;]+)/g); that would be nice as well.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex for stripping variable names from a JavaScript file
by dmmiller2k (Chaplain) on Feb 23, 2002 at 03:57 UTC | |
by Incognito (Pilgrim) on Feb 25, 2002 at 20:51 UTC | |
by dmmiller2k (Chaplain) on Feb 26, 2002 at 14:49 UTC | |
by Incognito (Pilgrim) on Feb 26, 2002 at 19:08 UTC | |
by dmmiller2k (Chaplain) on Feb 27, 2002 at 16:35 UTC |