Extract variables from file (split? regex? backflip?)

Lori713 has asked for the wisdom of the Perl Monks concerning the following question:

I want a list of all the scalar variables contained in my YIKES.pl file. Some notes/steps/ideas I've had so far:

--Point to/Open the file YIKES.pl that contains 8000 lines of Perl code (see pseudo sample below). There may be zero, one, or more scalar variables per line of code.
--no "my" was used to initialize the variables since use strict wasn't used.
--Take every unique occurrence of a scalar variable and put it into an array(?). Not sure if I can be greedy and get it to only give me a list of unique items...
--Every scalar variable begins with a "$" symbol with an unknown number of alphanumeric characters or underscores, and can be terminated by anything other than a letter, number or underscore.
--There are special Perl scalar variables like "$_", which would be nice to capture but not critical.
--Print the array to a file (text file would be okay), either separated with commas, or three "x"s, whatever/however.

I've looked at split and various regular expressions, but nothing's clicking. I also searched for regex, split, etc. on Perl monks for ideas. I'm especially having trouble getting a regex to recognize that there's more than one variable on a line, and terminating it before finding the next occurrence in that line. Or, getting a split to split on more than one character at a time.

I'd like to have a little .pl file that I can use to point at a file and get a list of the variables in that file. Any ideas on how best to accomplish this?

P.S. I'm really very new to Perl (see my initial post Sub-initiate needs help getting started), so a "dumbing down" re ideas/suggestions/explanations would not be considered bad on my part!

THANKS!

Lori

Pseudo sample YIKES.pl file (code doesn't work; modified to include certain characters/variations/possiblities):

print "<INPUT TYPE='HIDDEN' NAME='g_emplid' VALUE='$g_emplid'>" VALUE=
+'$g_emplid7xyz'>";
print "<INPUT TYPE='HIDDEN' NAME='g_oprid' VALUE='$g_oprid)'>";
print "<INPUT TYPE='HIDDEN' NAME='g_LogonName' VALUE='$g_Logon3Name"'>
+";
print "<INPUT TYPE='HIDDEN' NAME='g_projects' VALUE='$g_projects2['>";
print "<INPUT TYPE='HIDDEN' NAME='g_projects_valid' VALUE='$g_projects
+_valid,'>";
print "<INPUT TYPE='HIDDEN' NAME='g_projects_valid' VALUE='%g_projects
+_valid,'>";
[download]

update (broquaint): fixed formatting

Comment on Extract variables from file (split? regex? backflip?) Download Code

Replies are listed 'Best First'.
•Re: Extract variables from file (split? regex? backflip?) by merlyn (Sage) on Sep 03, 2003 at 18:18 UTC
If you're cross-referencing a Perl program, consider B::Xref. For example, `perl -MO=Xref YIKES.pl` [download] should dump all your variables including where they are defined and used. Your Perl code is messed-up, by the way. You've got a double-quote midway through that first line, and near the end of the third line. And those percents aren't going to work either. {grin} -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: Re: Extract variables from file (split? regex? backflip?) by Lori713 (Pilgrim) on Sep 04, 2003 at 00:55 UTC
Thanks for the suggestion! I especially like the idea that I can also find out where they were defined and used. That's just like getting a cherry on top of the icing on the cake!! Just so folks won't think I'm a complete idiot: ;-D I agree whole-heartedly that the Perl pseudo code won't work (I noted that it wouldn't work in the original post). It's similar to what was given to me to own (minus some weird characters I inserted in my sample pseudo code to show how the variables might appear in the rest of the 8000 lines) (see my original post Sub-initiate needs help getting started for the history of that 8000 line Perl file if you're interested).	[reply]
Re: Re: Extract variables from file (split? regex? backflip?) by Lori713 (Pilgrim) on Sep 04, 2003 at 15:55 UTC
`perl -MO=Xref YIKES.pl` Works like a charm! Thanks!	[reply]
Re: Extract variables from file (split? regex? backflip?) by diotalevi (Canon) on Sep 03, 2003 at 18:12 UTC
The easy (and wrong solution) is to use a regex on that. `use strict; use warnings; my %scalars; # This is the long and wordy version of the line that follows this whi +le loop. while ( my $line = <> ) { my @scalars_on_this_line = ( $line =~ /(\$\w+)/g ); foreach my $scalar ( @scalars_on_this_line ) { $scalars{ $scalar } = 1; } } # This is the faster (because of the hash slice )and smaller (undef va +lues instead of lots of '1' values) method. I'd write it this way. @scalars{ /(\$\w+)/g } = () while <>; $\ = $, = "\n"; print sort keys %scalars;` [download] The harder method would be to use something like B::Xref on your data. Try doing `perl -MB::Xref somescript.pl` and seeing how that works for you.	[reply] [d/l] [select]
Re: Extract variables from file (split? regex? backflip?) by BUU (Prior) on Sep 03, 2003 at 18:38 UTC
I have a much better idea. Instead of trying to manually print out every line of html, use HTML::Template and seperate the html from the perl. Then you could replace the above with __output.tmpl__ `<INPUT TYPE='HIDDEN' NAME='g_emplid' VALUE='<TMPL_VAR NAME="g_emplid"> +'> <INPUT TYPE='HIDDEN' NAME='g_oprid' VALUE='<TMPL_VAR NAME="g_oprid">'> <INPUT TYPE='HIDDEN' NAME='g_LogonName' VALUE='<TMPL_VAR NAME="g_Logon +3Name">'> <INPUT TYPE='HIDDEN' NAME='g_projects' VALUE='<TMPL_VAR NAME="g_projec +ts2">'> <INPUT TYPE='HIDDEN' NAME='g_projects_valid' VALUE='<TMPL_VAR NAME="g_ +projects_valid">'> <INPUT TYPE='HIDDEN' NAME='g_projects_valid' VALUE='<TMPL_VAR NAME="g_ +projects_valid">'>` [download] __output.pl__ `use HTML::Template; my $h = new HTML::Template(filename=>'output.tmpl'); $h->param ( g_emplid => $g_emplid, g_oprid => $g_oprid, g_LogonName => $g_LogonName, g_projects => $g_projects, g_projects_valid => $g_projects_valid, g_projects_valid => $g_projects_valid, ); print $h->output;` [download]	[reply] [d/l] [select]
Re: Re: Extract variables from file (split? regex? backflip?) by Lori713 (Pilgrim) on Sep 04, 2003 at 01:00 UTC
Excellent suggestion! Also, thank you for giving me an example of how to use it. I'm so new at this, and trying to dig through a program handed off to me. The original program has a lot of print statements in it in order to generate the HTML pages, and I'd like to slowly weed those out in future versions. (See Sub-initiate needs help getting started for history on this program I'm tackling).	[reply]
Re: Extract variables from file (split? regex? backflip?) by davido (Cardinal) on Sep 03, 2003 at 19:16 UTC
Besides the fact that someone has already done all the hard work for you with the B::Xref module, the other reason that home-brewed regular expressions aren't going to be a good approach is that there are so many special cases. You cannot simply assume that anything following a `$` is a scalar variable. And you cannot assume that scalar variables always consist of `$` followed immediately by an alphanumeric name. Take the following examples, for example: `'This might $look like a scalar, but because of the single quotes, it' +s not to be interpolated' /\w+$/ # Looks like the scalar $/, but isn't. "You owe me \$$money dollars!" # The first $ is escaped. ${$hello[10]} # Which scalar do you want? $hello[10] # Does an array element count as a scalar? $hello{Fred} # Does a hash element count as a scalar? @array = split /\$/, "Hello$world$here$I$come!"; $text =~ s/(.)(.+)/$2$1/; # $1 and $2 are scalars, do you want them?` [download] Ok, enough. There are infinately more examples of cruel and usual situations where your regexp searching for scalars is going to have to be grotesque, for it to do what you want. And even then it probably won't always work. merlyn suggested using the B::Xref module. I can't think of better advice. My point to this post was to try to convey why that is good advice. And though it's a moot point, since you're going to use B::Xref (right?) I did want to comment on your question of capturing only unique instances of a scalar, because the discussion applies to so many other situations... Pushing matches onto a stack (into an array) will do nothing for guaranteeing uniqueness. But using a hash will. Hash keys are always unique. Therefore, hashes provide a perfect way of checking to see if a given key already exists, and a foolproof method of ensuring that duplicates can't possibly be made to exist. Any time you're considering uniqueness to be an essential attribute of something you're storing, think of using a hash. Update: Thanks Not_a_Number for pointing out my misspelling of merlyn. It's been corrected. Dave "If I had my life to do over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l] [select]
Re: Re: Extract variables from file (split? regex? backflip?) by Not_a_Number (Prior) on Sep 03, 2003 at 19:29 UTC
What a difference a 'y' makes. merlin suggested... Er, merlin or merlyn ? :-) dave	[reply]
Re: Extract variables from file (split? regex? backflip?) by tcf22 (Priest) on Sep 03, 2003 at 18:07 UTC
You could use Devel::Symdump to dump out all of the scalars, however you would need to put this in you original source file. If everything is global, then you could probably just stick this at the end. `use strict; use warnings; require('YIKES.pl'); use Devel::Symdump; use Data::Dumper; my @array = Devel::Symdump->scalars('main'); print Dumper \@array;` [download] Update: You could also have the script above `require` the original script, and since the vars are global it should work.(Code Updated)	[reply] [d/l] [select]
Re: Re: Extract variables from file (split? regex? backflip?) by bunnyman (Hermit) on Sep 04, 2003 at 14:33 UTC
Maybe this is obvious, but using `require` on input files can be dangerous because it will run the code in that file.	[reply]