perlmeditation
broquaint
The first thing this tutorial will do is explain what lexical scoping means, so
as to keep things simple from the start.
<p/>
Firstly, don't go to a [http://www.dictionary.com|dictionary] as that won't
help you in this particular case. In perl, when we speak of something in the terms
of it being lexically scoped, we are talking about the area of code where the
given thing is visible e.g
<code>
{ # beginning of lexical scope
my $foo;
} # end of lexical scope
</code>
In the above code <tt>$foo</tt> can only be seen between the opening and closing
braces. This is because they delimit the length of the lexical scope, and after
the ending brace that particular instance of <tt>$foo</tt> no longer exists.
<p/>
<readmore>
So a lexical scope is a section of code where things can live temporarily.
I say they live temporarily because anything created within a lexical scope
will be deleted once the scope has been exited e.g
<code>
{ # begin lexical scope
my $foo = "a string";
print " \$foo is: ", (defined $foo ? $foo : "undefined"), $/;
} # end lexical scope
print "\$foo is: ", (defined $foo ? $foo : "undefined"), $/;
__output__
$foo is: a string
$foo is: undefined
</code>
There is an exception to this
rule however - if something is still referring to something created within a
lexical scope upon exit of the scope, that thing will not be deleted since it
is still being referred to by something. This does not mean you can still refer
to it directly, it just means that perl has yet to clean it up.
<code>
my $ref;
{ # begin lexical scope
my $foo = "something in a lexical scope";
$ref = \$foo;
} # end lexical scope
print "\$ref refers to: $$ref", $/;
print "\$foo is: ", (defined $foo ? $foo : "undefined"), $/;
__output__
$ref refers to: something in a lexical scope
$foo is: undefined
</code>
So we can see that <tt>$foo</tt> is still being referred to by <tt>$ref</tt>
but the user can't refer directly to it.
<p/>
<b><tt>my</tt> variables</b>
<p/>
Notice how all the variables are being declared with <tt>[perlfunc:my|my()]</tt>?
This wasn't done to comply with <tt>[strict]</tt> (although <tt>[strict]</tt>
does encourage the use of lexical variables, and with good reason too),
but because <tt>[perlfunc:my|my()]</tt> creates lexically scoped variables, or
simply, lexical variables.
<p/>
So every variable created with <tt>[perlfunc:my|my()]</tt> lives within the
current lexical scope. What about other variables you may ask? Well anything
that is not declared with a <tt>[perlfunc:my|my()]</tt> lives in the current
package (for more info on package global variables see. [id://211441]).
<p/>
Here's a brief example to illustrate the difference between lexical variables
and package global variables
<code>
{
my $foo = "a lexical variable";
$bar = "a package variable";
print " \$foo is: ", (defined $foo ? $foo : "undefined"), $/;
print " \$bar is: ", (defined $bar ? $bar : "undefined"), $/;
}
print "\$foo is: ", (defined $foo ? $foo : "undefined"), $/;
print "\$bar is: ", (defined $bar ? $bar : "undefined"), $/;
__output__
$foo is: a lexical variable
$bar is: a package variable
$foo is: undefined
$bar is: a package variable
</code>
There <tt>$foo</tt> lives within its lexical scope, <tt>$bar</tt> lives
within the current package, so doesn't disappear until it is explicitly deleted from the symbol table.
<p/>
Another thing to be noted about <tt>[perlfunc:my|my()]</tt> is that it is
a <i>compile-time</i> directive (this is because all things lexical are
calculated at compile-time).
This is the phase when the perl interpreter is putting the code together.
So once our scopes and variables
have been set they cannot be changed at runtime, like package globals can.
<p/>
What this means is that lexical variables are
<i>declared</i> at compile-time, <i>not</i> initialised e.g
<code>
use strict;
my $foo = "defined";
BEGIN {
print "foo is ", defined($foo) ? $foo : 'undef',
" during BEGIN phase\n";
};
print "foo is ", defined($foo) ? $foo : undef, " at runtime\n";
__output__
foo is undef during BEGIN phase
foo is defined at runtime
</code>
This demonstrates that <tt>$foo</tt> is declared, since <tt>[strict]</tt>
does not have a problem, but is still undefined since it hasn't has anything
assigned to it.
<p/>
<b>More than naked</b>
<p/>
So far we've been using naked blocks to delimit the length of our lexical
scopes. How else, you might wonder, are lexical scopes defined?
<p/>
Well firstly there's the lexical file scope, which is the length of a given
perl source file e.g
<code>
## lextut1.pl
my $foo = "in lextut1.pl's lexical file scope";
print "\$foo is: ", (defined $foo ? $foo : "undefined"), $/;
</code>
Now on the command-line
<code>
perl -e 'require "lextut1.pl"; \
print "\$foo is: ", (defined $foo ? $foo : "undefined"), $/;'
$foo is: in lextut1.pl's lexical file scope
$foo is: undefined
</code>
As we can see there, <tt>$foo</tt> only lives for the length of the file
<tt>lextut1.pl</tt>, and has fallen out of scope by the time
<tt>[perlfunc:require|require]</tt> has finished doing its thing.
<p/>
Secondly, the braces around subroutine code delimit a lexical scope, so
anything declared within a subroutine cannot be seen from outside it e.g
<code>
sub foo
{ # begin lexical scope
my $x = "a string";
print "\$x in foo() is: ", (defined $x ? $x : "undefined"), $/;
bar();
} # end lexical scope
sub bar
{ # begin lexical scope
print "\$x in bar() is: ", (defined $x ? $x : "undefined"), $/;
} # end lexical scope
foo();
__output__
$x in foo() is: a string
$x in bar() is: undefined
</code>
So subroutines scope follows along the same lines as the scope in naked
blocks.
<p/>
For conditional statements and loop statements the case is somewhat
different as lexicals can be declared in the condition block/loop assignment,
which occurs before the braces e.g
<code>
open(SRC, $0) or die("ack: $!");
my @lines = <SRC>;
## $line is declared *before* the braces
foreach my $line (@lines) {
## $w is declared within the condition, which is
## also before the braces
if(my($w) = $line =~ /\b(\w+)\b/) {
print "bareword found: $w\n";
}
print "\$w is: ", (defined $w ? $w : "undefined"), $/
if $line eq $lines[$#lines];
}
print "\$line is: ", (defined $line ? $line : "undefined"), $/;
__output__
bareword found: open
bareword found: my
bareword found: foreach
bareword found: if
bareword found: print
bareword found: print
bareword found: if
bareword found: print
$w is: undefined
$line is: undefined
</code>
Although somewhat convoluted the above example demonstrates the fact that
the condition of the <tt>if</tt> and the loop assignment in the <tt>foreach</tt>
are lexically scoped to the braces which delimit the respective statements.
<p/>
Note, however, that statement modifiers <i>do not</i> create a new lexical
scope (this should be obvious through their lack of braces) e.g
<code>
## otherwise $r would be auto-vifified as a package global
use strict;
print $r,$/ if my $r = 10 % 5;
__output__
Global symbol "$r" requires explicit package name at - line 1.
Execution of - aborted due to compilation errors.
</code>
The remaining ways of creating a lexical scope are as follows
<ul>
<li> builtin functions which take code blocks e.g
<tt>[perlfunc:map|map]</tt>, <tt>[perlfunc:grep|grep]</tt>,
<tt>[perlfunc:exec|exec]</tt>, <tt>[perlfunc:sort|sort]</tt> etc
<li> anonymous subroutines (since they are orthogonal with normal
subroutines in this respect)
<li> naked blocks, anonymous or labelled
<li> and the nasty but occasionally necessary string <tt>[perlfunc:eval|eval]</tt>.
</ul>
<p/>
<b>In private</b>
<p/>
A lot of literature when talking about lexical variables refers to them as
<i>private</i> variables. This is because they cannot be seen outside their
given lexical scope. As has already been illustrated, lexical variables are
deleted once the end of their given scope is reached (exceptions withstanding),
so they really are private to their respective scope.
<p/>
A feature which is an essential part of lexical scoping is that scopes
can be nested and inner scopes will not effect outer scopes e.g
<code>
my $foo = "file scope";
{
my $foo = "outer scope";
{
my $foo = "inner scope";
print " \$foo is: $foo\n";
}
print " \$foo is: $foo\n";
}
print "\$foo is: $foo\n";
__output__
$foo is: inner scope
$foo is: outer scope
$foo is: file scope
</code>
There, the inner scope is a <i>new</i> scope (much like the outer scope is a new
sub scope of the file scope), so a new instance of <tt>$foo</tt>
is created leaving the outer <tt>$foo</tt> untouched when the inner scope
exits. And because the inner <tt>$foo</tt> only lives within that scope,
it private to that scope, and nothing else can see it.
<p/>
This is not to say that nested scopes do not affect the rest of the program
(as <i>any</i> new scopes are just sub scopes of the file level lexical scope),
it just means that anything created within them is private to that given scope e.g
<code>
my @list = qw(a list of words);
for my $w (@list) {
if($w =~ /^[aeiou]/) {
$w = "$w: begins with a vowel";
} else {
$w = "$w: begins with a consonant";
}
print $w, $/;
}
__output__
a: begins with a vowel
list: begins with a consonant
of: begins with a vowel
words: begins with a consonant
</code>
So even though we create a new scope with the <tt>if/else</tt> statement,
we're still changing <tt>$w</tt> in the scope above (which in turn is modifying
the elements of list since <tt>$w</tt> is just an alias to each element) as
we haven't created a new <tt>$w</tt> for that particular scope (and of course,
it wouldn't do us a lot of good as it would've fallen out of scope by the
time we came to <tt>[perlfunc:print|print]</tt> it).
<p/>
<b><tt>local</tt> debunked</b>
<p/>
Well, we've been putting it off long enough and now it is time face that most
confounding of functions - <tt>[perlfunc:local|local]</tt>.
<p/>
The first thing that we absolutely must declare is that <tt>[perlfunc:local|local]</tt>
<i>does not</i> create variables! Not only does it not create variables, it has
nothing to do with lexical variables.
<p/>
With that said, what <tt>[perlfunc:local|local]</tt> does do is change the
value of an existing <i>package global</i> for the length of a given <i>dynamic</i> scope.
A dynamic scope is <i>just</i> like a lexical scope but is defined by the length of scope,
not the visibility of the scope. So <tt>[perlfunc:local|local]</tt> is
localising a package globals value for the <i>length</i> of a given lexical scope e.g
<code>
sub foo {
print " \$x is: $x\n";
}
$x = "original state";
{ # beginning of lexical scope
local $x = "altered state";
foo();
} # end of lexical scope
print "\$x is: $x\n";
__output__
$x is: altered state
$x is: original state
</code>
As we can see the value of <tt>$x</tt> is still set to <tt>'altered state'</tt> in
<tt>foo()</tt> even though its outside of the initial lexical scope. But
because <tt>$x</tt> has been dynamically scoped with
<tt>[perlfunc:local|local]</tt> and <tt>foo()</tt> was called within the
surrounding lexical scope <tt>$x</tt> will stay set to <tt>'altered state'</tt>
until the lexical scope exits.
<p/>
You might also see examples of it being used to create private variables - this
is rather misguided as it is auto-vivifying (creating it upon request of its
existence) the variable e.g
<code>
{ # begin lexical scope
local $x = "auto-vivified";
print " \$x is: ", (defined $x ? $x : "undefined"), $/;
} # end lexical scope
print "\$x is: ", (defined $x ? $x : "undefined"), $/;
print "*x is: ", (exists $main::{x} ? $main::{x} : "undefined"), $/;
__output__
$x is: auto-vivified
$x is: undefined
$main::{x} is: *main::x
</code>
So <tt>[perlfunc:local|local]</tt> has forced <tt>$x</tt>'s temporary creation
and then it dutifully fall's out of scope, leaving it undefined but still
with an existing entry in the symbol table.
<p/>
So generally you'll want to use <tt>[perlfunc:my|my]</tt> instead of
<tt>[perlfunc:local|local]</tt>. However <tt>[perlfunc:local|local]</tt> does
have its uses, such as localising punctuation globals e.g
<code>
use IO::File;
my $file;
{ ## this trick is known as file slurping
local $/;
my $fh = IO::File->new("lextut1.pl") or die("ack: $!");
$file = <$fh>;
}
print $file;
my @foo = qw( a comma separated list of words );
{
local $" = ', ';
print "@foo\n"
}
__output__
my $foo = "in the lextut1.pl's lexical file scope";
print "\$foo is: ", (defined $foo ? $foo : "undefined"), $/;
a, comma, separated, list, of, words
</code>
In the first case we've set the input separator to undefined, so when
<tt>$fh</tt> is read, it reads right to the end of the file. And in the second
case we localise the list separator for stringfied lists to a comma followed
by a space, and the original list describes its final output.
<p/>
<a name="our"></a><b><tt>our</tt> variables</b>
<p/>
This is somewhat of an oddball in the world of variables in that it
creates a package level variable which is visible for the remaining lexical scope e.g
<code>
{
package foo;
our $x = "in foo";
package bar;
## $x can still be seen as it is still in scope
print " \$x is: $x\n";
}
print "\$foo::x is: $foo::x\n";
__output__
$x is: in foo
$foo::x is: in foo
</code>
So <tt>our $x</tt> has created the package global <tt>$foo::x</tt>, but it is also visible in the remaining lexical scope which can still be seen in the package <tt>bar</tt>.
This illustrates why <tt>[perlfunc:our|our]</tt> is somewhat of a two-faced
function and best left alone unless the behaviour is specifically desired (at
least in this humble tutorial author's opinion).
<p/>
<b>Scoping schmoping</b>
<p/>
Ok, you say, I can see what lexical scoping is about and have an understanding of
how it works, but what use is it to me?
<p/>
Firstly, you can neatly encapsulate separate groups of operations into individual lexical
scopes to avoid namespace collision and the like (this is widely demonstrated
through the use of subroutines and modules). This in turn leads to nicely
encapsulated sections of code which can be isolated from the main body of code,
which in turns means that the variables will tie very closely to the
surrounding code.
<p/>
Secondly, because lexical
scoping is determined at compile-time, if there are any errors they will be
picked up before the program can even run (this is doubly true if you're
running with [strict|strictures] on, you are <tt>[strict|use()ing strict]</tt>
right?).
<p/>
Thirdly, at the exit of a lexical scope all the variables are destroyed
(except of course, for those that are still in use),
which means your memory won't keep growing and growing as more variables
are created. Also quite handily, any objects will have the <tt>DESTROY</tt>
method called upon exit, so you can handle how your objects are cleaned up.
<p/>
<b>Something useful</b>
<p/>
Now we're done with our learning, let's have some doing!
<p/>
The below example will recurse through a given directory and will list
each .pl and .pm with the amount of lines in the file.
<code>
## set stricture checking for the rest of the file scope
use strict;
## see. man perllexwarn for why this is double-plus good
use warnings;
## ah, heaven-sent
use File::Find::Rule;
## for lexically scoped file-handles
use IO::File;
## prevent subroutines from being able access program level variables
{
## naked block's lexical scope
my @files = File::Find::Rule->file()
->name("*.{pl,pm}")
->in( shift @ARGV );
## $fl is in the foreach lexical scope
foreach my $fl (@files) {
## ditto with $lc
my $lc = count_lines($fl);
print "$fl: $lc line".($lc != 1 && 's')." of code\n";
}
}
sub count_lines {
## will be closed when we exit the current scope
my $fh = IO::File->new(shift) or die("ack: $!");
## scoped (and therefore private) to count_lines()
my $count = 0;
$count++ while <$fh>;
return $count;
}
</code>
Wow, there's quite a lot of lexical scoping going on there, both explicitly
(i.e the naked block containing the core of the program) and implicitly
(i.e <tt>count_lines()</tt>' lexical scope) and at this point it should all
be pretty straight forward (and I imagine the comments help too :).
<p/>
<b>In review</b>
<p/>
A lexical scope defines an area of code in which any variables declared within
that area will live for only duration of the execution of that area of code,
unless a variable is still referenced after the area of code has been left.
A dynamic scope is orthogonal to a lexical scope and is defined by the
<i>length</i> of the scope (as opposed to the visibility of the scope).
<p/>
<tt>[perlfunc:my|my]</tt> <i>declares</i> lexically scoped variables at compile time, <tt>[perlfunc:local|local]</tt> changes a package global's value throughout a dynamic scope and <tt>[perlfunc:our|our]</tt> creates a package global which is visible throughout its given lexical scope.
<p/>
And there we have it! I hope you've enjoyed this tutorial and gotten everything
out of it that you had intended to, and can now go forth and frolic in the
land of lexical scoping with glee and pride!
</readmore>
<p/>
Many thanks to [adrianh], [AltBlue], [BrowserUk], [davis], [dingus], [Elian], [jdporter], [jpl], [robartes] and [tye] for their input and help in knocking out the various bugs.
<p/>
<tt>_________<br><u>broquaint</u></tt>