perlmeditation
robin
<p>
So, you know everything there is to know about Perl.
You stifle a yawn as you flick through the latest
obfuscations – how obvious they are! It's gratifying
to be an expert; and your patient, only slightly
condescending, help is appreciated by the less
experienced. Life is good. But sometimes,
in a reflective moment, you miss your younger
days as an intrepid explorer in unfamiliar terrain,
where there was always a new mystery to unravel, a new
landscape to be discovered.
</p><p>
You could go and learn ruby or something, but that
feels vaguely disloyal after so many years nestled
happily in the bosom of Perl. And besides, you don't
really like the idea of being patronised by experts
on rubymonks or whatever the hell they have.
</p><p>
This is the node you've been waiting for. There is
a way! You can plunge into an unfamiliar world of
mystery and <i>at the same time</i> enhance your
reputation as a Perl guru. I'm talking about perl:
the guts, the source, the motherlode.
</p>
<b>Update:</b> Fixed a few typos; added a “Testing”
section, as [id://501855|suggested by hossman].
</p>
<readmore>
<p>You've probably heard the stories, of strong men
who went there and never returned, or returned
mumbling and broken and won’t tell what they saw.
One explorer survived long enough to describe “an
interconnected mass of livers and pancreas and lungs
and little sharp pointy things and the [google://"occasional exploding kidney"|occasional
exploding kidney]”.
</p><p>
Don’t worry about it. You’ll be fine. The mistake
that people often make is they try to <i>understand</i>
it. (Some people make the same mistake with life, or
the state of the world, or <i>The Prisoner</i>.) If you can
avoid that, you'll be all right. Just crack it open and
get stuck in. The slogan of the hour is “HACK FIRST,
THINK LATER”.
</p><p>
The trick is just to mess with things. Don’t waste
too much time worrying about what's going to happen:
try it and <i>see</i> what happens.
</p><p>
People often say that a good way to get started is
to try and fix a bug that somebody’s reported. There’s
some truth in that, but debugging isn’t most people’s
idea of a good time. So that's not what we'll do; instead
we’ll add a new feature: lexical typeglobs. It’s always
bothered me a little that you can’t say <tt>my *foo</tt>.
Why can’t you? There’s no good reason. It’s not very
useful, I admit, but it would be kind of cool.
</p><p>
I did it using the HACK FIRST methodology, and wrote
down what I was doing as I went along. I was really
surprised at how easy it turned out to be: the final
patch only changes 38 lines of code.
</p><p>
If you’re the kind of person who always reads the
guidebook before going to a new place, you might like
to glance at these: [doc://perlhack], [doc://perlguts].
It’s not compulsory though.
</p><p>
The first thing to do is to get a copy of the source.
You can hack on whichever version you like, but I decided
to use the latest “bleadperl”. If you want to follow
along, you should get it too. This is an interactive
tutorial: it’s not designed for reading in the bath,
and it probably won’t make so much sense if you’re
not actually tinkering with the source while you read.
</p><p>
So find a disk with a reasonable amount of free space,
make a directory for the source to go in, and grab it.
I did it like this:
<code>
cd build
mkdir perl-current
cd perl-current
rsync -avz rsync://ftp.linux.activestate.com/perl-current/ .
</code>
and you should do something similar. Now build it, to make
sure it’s working before you start.
<code>
sh Configure -Doptimize='-g' -Dusethreads -Dusedevel -Dprefix=/local/perl
-ders && make
</code>
</p><p>
You can vary this as you like, but the <tt>-Doptimize='-g' -Dusedevel</tt>
is essential. The <tt>-Dusedevel</tt> tells it that, yes, you
really want to build a development version; and <tt>-Doptimize='-g'</tt>
turns on debugging mode, which we’re going to make good use
of later on.
</p><p>
In case you haven’t looked at [doc://perlhack], I’ll quickly
explain the rough structure of the source. Perl code is tokenised
by a rather hairy routine called <tt>yylex</tt> that lives in
<tt>toke.c</tt>, then it’s parsed using the bison grammar that
lives in <tt>perly.y</tt>. The grammar uses the routines in <tt>op.c</tt>
to build an optree.
</p><p>
The optree is then executed by a one-liner that lives in
<tt>run.c</tt>, which dispatches each op to the appropriate
routine. The ops themselves are implemented by functions
in the files <tt>pp*.c</tt>.
</p><p>
Back to the problem at hand! Our first task is to persuade
perl to recognise the new construct. A quick
<code>
perl -e 'my *x'
</code>
gives
<code>
syntax error at -e line 1, near "my *x"
</code>
which shows that the parser doesn’t even recognise
the syntax. So we crack open the grammar (in
<tt>perly.y</tt>, remember), and start grepping for ‘my’. Soon
we find this:
<code>
/* "my" declarations, with optional attributes */
myattrterm: MY myterm myattrlist
{ $$ = my_attrs($2,$3); }
| MY myterm
{ $$ = localize($2,$1); }
;
/* Things that can be "my"'d */
myterm : '(' expr ')'
{ $$ = sawparens($2); }
| '(' ')'
{ $$ = sawparens(newNULLLIST()); }
| scalar %prec '('
{ $$ = $1; }
| hsh %prec '('
{ $$ = $1; }
| ary %prec '('
{ $$ = $1; }
;
</code>
and you don't need a degree in rocket science to see
that this is the bit we’re interested in. Down at the
end of the file, the symbols ‘<tt>scalar</tt>’, ‘<tt>hsh</tt>’
and ‘<tt>ary</tt>’ are defined, like this:
<code>
scalar : '$' indirob
{ $$ = newSVREF($2); }
;
ary : '@' indirob
{ $$ = newAVREF($2); }
;
hsh : '%' indirob
{ $$ = newHVREF($2); }
;
</code>
and there’s a similar entry for globs, though it seems
to be called “star” rather than “glob”. (The perl
source is full of little things that don’t quite make
sense – that’s part of its charm.) Here it is:
<code>
star : '*' indirob
{ $$ = newGVREF(0,$2); }
;
</code>
</p><p>
So let’s add another clause to the definition of ‘myterm’, like this:
<code>
| star %prec '('
{ $$ = $1; }
</code>
Time to check it out! First we have to rebuild the
parser using the new grammar, then rebuild perl
itself:
<code>
perl regen_perly.pl
make perl
</code>
Try the one-liner again:
<code>
$ ./perl -e 'my *x'
Can't declare ref-to-glob cast in "my" at -e line 1, at EOF
</code>
Great! It’s been parsed okay, and it’s now being rejected
during compilation. The compiler is housed in <tt>op.c</tt>, and
we’ll need to write some code to compile our new construct.
But before we can do that, we need to decide what we’re
going to compile it to. Let’s have a quick peek at what
perl does with other ‘my’ declarations:
<code>
$ ./perl -MO=Concise -e 'my $x; my @y; my %z'
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v ->3
3 <0> padsv[$x:1,4] vM/LVINTRO ->4
4 <;> nextstate(main 2 -e:1) v ->5
5 <0> padav[@y:2,4] vM/LVINTRO ->6
6 <;> nextstate(main 3 -e:1) v ->7
7 <0> padhv[%z:3,4] vM/LVINTRO ->8
-e syntax OK
</code>
Okay, so they get compiled to special ops called <tt>/pad.v/</tt>.
(In case you don’t know, there’s a special perl guts shorthand for
different types of value. The most important ones are: a scalar is
an SV, an array is an AV, a hash is an HV and a glob is a GV. Oh
yeah, and a reference is an RV. Pretty simple really.)
</p><p>
Looks like we ought to make a <tt>padgv</tt> op! The ops are all
defined in a file called <tt>opcode.pl</tt>, which auto-generates
the relevant header files. If we were worried about backwards
compatibility, we’d add the new op at the end; but this is
just for fun so we’re not really fussed about compatibility,
and we’ll add it at the logical place in the file:
<code>
$ diff -u opcode.pl{.orig,}
--- opcode.pl.orig 2005-10-18 19:07:24.000000000 +0100
+++ opcode.pl 2005-10-18 19:07:49.000000000 +0100
@@ -491,6 +491,7 @@
padsv private variable ck_null ds0
padav private array ck_null d0
padhv private hash ck_null d0
+padgv private glob ck_null d0
padany private value ck_null d0
pushre push regexp ck_null d/
</code>
Now run <tt>opcode.pl</tt>, which updates <tt>opcode.h</tt>,
<tt>opnames.h</tt> and <tt>pp_proto.h</tt> on your behalf.
</p><p>
We’ve got a new op, but the compiler isn’t going to
use it unless we tell it how. So crack open <tt>op.c</tt>, and
squint at the <tt>Perl_newGVREF()</tt> function. Looking at
<tt>Perl_new[SAH]VREF</tt> for comparison, it’s fairly obvious
what we have to do:
<code>
--- op.c.before 2005-10-18 19:13:33.000000000 +0100
+++ op.c 2005-10-18 19:33:09.000000000 +0100
@@ -4902,7 +4902,13 @@
OP *
Perl_newGVREF(pTHX_ I32 type, OP *o)
{
- if (type == OP_MAPSTART || type == OP_GREPSTART || type == OP_SORT)
+ dVAR;
+ if (o->op_type == OP_PADANY) {
+ o->op_type = OP_PADGV;
+ o->op_ppaddr = PL_ppaddr[OP_PADGV];
+ return o;
+ }
+ else if (type == OP_MAPSTART || type == OP_GREPSTART || type == OP_SORT)
return newUNOP(OP_NULL, 0, o);
return ref(newUNOP(OP_RV2GV, OPf_REF, o), type);
}
</code>
</p><p>
We’re going to need to implement the new op at some
point, but for now let’s just whack in a placeholder:
<code>
--- pp.c.orig 2005-10-18 19:16:07.000000000 +0100
+++ pp.c 2005-10-18 19:17:51.000000000 +0100
@@ -127,6 +127,11 @@
RETURN;
}
+PP(pp_padgv)
+{
+ DIE(aTHX_ "OP_PADGV NOT YET IMPLEMENTED");
+}
+
PP(pp_padany)
{
DIE(aTHX_ "NOT IMPL LINE %d",__LINE__);
</code>
</p><p>
Wahey! Let’s check out what we’ve got so far. Run ‘make’
to rebuild it, then:
<code>
$ ./perl -MO=Concise -e 'my *x'
Can't declare ref-to-glob cast in "my" at -e line 1, at EOF
-e had compilation errors.
5 <@> leave[1 ref] KP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) ->3
4 <1> rv2gv sKR/1 ->5
3 <#> gv[*x] s ->4
</code>
</p><p>
Oh dear. :-(
</p><p>
The error is to be expected – we still haven’t looked
into that – but we have <tt>OP_RV2GV</tt> instead of our shiny
new <tt>OP_PADGV</tt>. What’s that about? Time to wheel out the
old debugger.
<code>
$ gdb --args ./perl -e 'my *x'
...
(gdb) br Perl_newGVREF
Breakpoint 1 at 0x80a51e2: file op.c, line 4905.
(gdb) run
Starting program: /local/build/perl-current/perl -e my\ \*x
...
Breakpoint 1, Perl_newGVREF (my_perl=0x81b07c0, type=0, o=0x81ccc50)
at op.c:4905
4905 if (o->op_type == OP_PADANY) {
(gdb) p *o
$1 = {op_next = 0x81ccc50, op_sibling = 0x0,
op_ppaddr = 0x80e2c0c <Perl_pp_const>, op_targ = 0, op_type = 5, op_opt = 0,
op_static = 0, op_spare = 0, op_flags = 2 '\002', op_private = 16 '\020'}
</code>
Hmm, so we’ve got an <tt>OP_CONST</tt> instead of the <tt>OP_PADANY</tt> we
were expecting. It’s time to find out where those PADANYs
are coming from:
<code>
$ grep OP_PADANY *.c
...
toke.c: yylval.opval = newOP(OP_PADANY, 0);
toke.c: yylval.opval = newOP(OP_PADANY, 0);
</code>
</p><p>
Ah! It’s tokeniser magic. (You didn’t think the tokeniser just
tokenised, did you? <i>Oh</i> no.) That means it’s time to dive into
toke.c and see if we can grok what’s happening there. These
<tt>newOP()</tt> commands are both in a function called
<tt>S_pending_indent()</tt>,
which gets called right from the top of the main lexer routine
<tt>Perl_yylex()</tt>:
<code>
...
if (PL_pending_ident)
return REPORT(S_pending_ident(aTHX));
...
</code>
That means that <tt>PL_pending_ident</tt> must be getting set for
<tt>$foo</tt>, <tt>@foo</tt> and <tt>%foo</tt>, but not for <tt>*foo</tt>. A quick grep through
the file reveals that we’re quite right – when a ‘<tt>*</tt>’ is
encountered, something called <tt>force_ident()</tt> gets called
instead. Let’s try changing it:
<code>
--- toke.c.orig 2005-10-18 19:47:57.000000000 +0100
+++ toke.c 2005-10-18 19:48:43.000000000 +0100
@@ -3169,7 +3169,7 @@
if (PL_expect != XOPERATOR) {
s = scan_ident(s, PL_bufend, PL_tokenbuf, sizeof PL_tokenbuf, TRUE);
PL_expect = XOPERATOR;
- force_ident(PL_tokenbuf, '*');
+ PL_pending_ident = '*';
if (!*PL_tokenbuf)
PREREF('*');
TERM('*');
</code>
</p><p>
We try rebuilding perl. It builds miniperl okay, but then
it dies with a load of syntax errors during the build process.
The most telling-looking one is the second one:
<code>
syntax error at ../lib/vars.pm line 29, near "*$sym "
</code>
</p><p>
Hmm, let’s see what the tokeniser is up to:
<code>
$ ./miniperl -DT -ce '*$x'
### 0:LEX_NORMAL/XSTATE "\n;"
### <== '*'
### 1:LEX_NORMAL/XREF "$x\n"
### Pending identifier ''
### <== WORD(opval=op_const) PV(""\0)
### 1:LEX_NORMAL/XREF "$x\n"
### <== '$'
### 1:LEX_NORMAL/XOPERATOR ";"
### Pending identifier '$x'
### <== WORD(opval=op_const) PV("x"\0)
### 1:LEX_NORMAL/XOPERATOR ";"
### <== ';'
### 1:LEX_NORMAL/XSTATE ""
### Tokener got EOF
### <== EOF
syntax error at -e line 1, next char $
-e had compilation errors.
</code>
I can’t tell what’s wrong just looking at that, so let’s try
comparing it to something similar that does work:
<code>
$ ./miniperl -DT -ce '%$x'
### 0:LEX_NORMAL/XSTATE "\n;"
### <== '%'
### 1:LEX_NORMAL/XREF "$x\n"
### <== '$'
### 1:LEX_NORMAL/XOPERATOR ";"
### Pending identifier '$x'
### <== WORD(opval=op_const) PV("x"\0)
### 1:LEX_NORMAL/XOPERATOR ";"
### <== ';'
### 1:LEX_NORMAL/XSTATE ""
### Tokener got EOF
### <== EOF
EXECUTING...
-e syntax OK
</code>
That’s interesting! There’s definitely something different there.
The second block of the ‘<tt>*</tt>’ run doesn’t appear at all in this one.
</p><p>
Maybe we got something wrong in the tokeniser change? Sure enough,
another look at toke.c shows that the code for ‘<tt>%</tt>’ is setting
<tt>PL_tokenbuf[0] = '%'</tt>, which we weren’t doing. So let’s try a
slightly improved change, copying the structure of the ‘<tt>%</tt>’ code
a bit more faithfully:
<code>
--- toke.c.orig 2005-10-18 19:47:57.000000000 +0100
+++ toke.c 2005-10-18 20:16:52.000000000 +0100
@@ -3167,11 +3167,11 @@
case '*':
if (PL_expect != XOPERATOR) {
- s = scan_ident(s, PL_bufend, PL_tokenbuf, sizeof PL_tokenbuf, TRUE);
- PL_expect = XOPERATOR;
- force_ident(PL_tokenbuf, '*');
- if (!*PL_tokenbuf)
+ PL_tokenbuf[0] = '*';
+ s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof PL_tokenbuf - 1, TRUE);
+ if (!PL_tokenbuf[1])
PREREF('*');
+ PL_pending_ident = '*';
TERM('*');
}
s++;
</code>
Right, let’s try another make. This one fails too, but in
a much more interesting way:
<code>
Global symbol "%DELETE" requires explicit package name at lib/Config_heavy.pl line 1158
</code>
Looking at line 1158 of <tt>Config_heavy.pl</tt>, we find:
<code>
*DELETE = \&STORE;
</code>
It looks like "<tt>use strict</tt>" is now affecting globs! Indeed:
<code>
$ ./miniperl -ce '*foo'
-e syntax OK
$ ./miniperl -ce 'use strict; *foo'
Global symbol "%foo" requires explicit package name at -e line 1.
-e had compilation errors.
</code>
Also, it seems to be confusing globs with hashes. What's this
"<tt>%foo</tt>" all about? Let's see where the error is coming from:
<code>
$ fgrep 'explicit package name' *.c
gv.c: "Global symbol \"%s%s\" requires explicit package name",
</code>
A quick look in gv.c tells us that it's coming from
<tt>Perl_gv_fetchpvn_flags()</tt>. Who's calling that? It's
debugger time again!
<code>
$ gdb --args ./miniperl -e '*foo'
(gdb) br Perl_gv_fetchpvn_flags
Breakpoint 1 at 0x806c485: file gv.c, line 710.
(gdb) condition 1 nambeg[0] == 'f'
(gdb) run
Starting program: /local/build/perl-current/miniperl -e \*foo
[New Thread 1074334048 (LWP 2919)]
[Switching to Thread 1074334048 (LWP 2919)]
Breakpoint 1, Perl_gv_fetchpvn_flags (my_perl=0x8199720,
nambeg=0x8199ca9 "foo", full_len=3, flags=1, sv_type=12) at gv.c:710
710 register const char *name = nambeg;
(gdb) bt
#0 Perl_gv_fetchpvn_flags (my_perl=0x8199720, nambeg=0x8199ca9 "foo",
full_len=3, flags=1, sv_type=12) at gv.c:710
#1 0x0806c406 in Perl_gv_fetchpv (my_perl=0x8199720, nambeg=0x8199ca9 "foo",
add=1, sv_type=12) at gv.c:696
#2 0x08087238 in S_pending_ident (my_perl=0x8199720) at toke.c:5669
#3 0x08076115 in Perl_yylex (my_perl=0x8199720) at toke.c:2431
#4 0x08092b5b in Perl_yyparse (my_perl=0x8199720) at perly.c:412
#5 0x08063ebc in S_parse_body (my_perl=0x8199720, env=0x0,
xsinit=0x804bb65 <xs_init>) at perl.c:2136
#6 0x08062d16 in perl_parse (my_perl=0x8199720, xsinit=0x804bb65 <xs_init>,
argc=3, argv=0xbffff3f4, env=0x0) at perl.c:1542
#7 0x0804baeb in main (argc=3, argv=0xbffff3f4, env=0xbffff404)
at miniperlmain.c:101
#8 0x42015704 in __libc_start_main () from /lib/tls/libc.so.6
</code>
</p><p>
Aha! So <tt>S_pending_ident()</tt> is calling <tt>gv_fetchpv()</tt>.
There it is, right at the end, like this:
<code>
gv_fetchpv(PL_tokenbuf+1, PL_in_eval ? (GV_ADDMULTI | GV_ADDINEVAL) : TRUE,
((PL_tokenbuf[0] == '$') ? SVt_PV
: (PL_tokenbuf[0] == '@') ? SVt_PVAV
: SVt_PVHV));
</code>
</p><p>
Ah! It's assuming that anything that's not a scalar or an array
must be a hash. But we've added a new possibility, so let’s tell it
about that:
<code>
--- toke.c.1 2005-10-18 20:53:32.000000000 +0100
+++ toke.c 2005-10-18 20:54:34.000000000 +0100
@@ -5666,6 +5666,7 @@
gv_fetchpv(PL_tokenbuf+1, PL_in_eval ? (GV_ADDMULTI | GV_ADDINEVAL) : TRUE,
((PL_tokenbuf[0] == '$') ? SVt_PV
: (PL_tokenbuf[0] == '@') ? SVt_PVAV
+ : (PL_tokenbuf[0] == '*') ? SVt_PVGV
: SVt_PVHV));
return WORD;
}
</code>
</p><p>
Now we try another ‘<tt>make perl</tt>’, and everything builds as normal.
Phew! Even better, the new op is being used in the right place:
<code>
$ ./perl -MO=Concise -e 'my *foo'
Can't declare private glob in "my" at -e line 1, at EOF
-e had compilation errors.
4 <@> leave[1 ref] KP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) ->3
3 <0> padgv[*foo:1,2] ->4
</code>
Still got that pesky error though... I wonder where that
one's coming from:
<code>
$ fgrep "Can't declare" *.c
op.c: "Can't declare class for non-scalar %s in \"%s\"",
op.c: yyerror(Perl_form(aTHX_ "Can't declare %s in %s",
op.c: yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"",
opmini.c: "Can't declare class for non-scalar %s in \"%s\"",
opmini.c: yyerror(Perl_form(aTHX_ "Can't declare %s in %s",
opmini.c: yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"",
</code>
</p><p>
<tt>opmini.c</tt> is just an autogenerated copy of <tt>op.c</tt> that’s used to
build miniperl. I don’t know why there’s a separate file for
this – probably those hysterical raisins again. Anyway, this
looks like the right bit, here:
<code>
else if (type != OP_PADSV &&
type != OP_PADAV &&
type != OP_PADHV &&
type != OP_PUSHMARK)
{
yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"",
OP_DESC(o),
PL_in_my == KEY_our ? "our" : "my"));
return o;
}
</code>
I guess we need to tell it about our new op.
<code>
--- op.c.1 2005-10-18 21:03:51.000000000 +0100
+++ op.c 2005-10-18 21:04:11.000000000 +0100
@@ -1704,6 +1704,7 @@
else if (type != OP_PADSV &&
type != OP_PADAV &&
type != OP_PADHV &&
+ type != OP_PADGV &&
type != OP_PUSHMARK)
{
yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"",
</code>
Rebuild once again, and:
<code>
$ ./perl -e 'my *foo'
OP_PADGV NOT YET IMPLEMENTED at -e line 1.
</code>
It’s executing our new op! <i>Now</i> we’re talking! I guess that
means that we ought to <i>implement</i> the thing. We don’t really
know what we’re doing here, so let’s just try something really
simple, a kind of ultra stripped-down version of <tt>pp_padsv</tt>, and
see what happens.
<code>
--- pp.c.orig 2005-10-18 19:16:07.000000000 +0100
+++ pp.c 2005-10-18 22:23:28.000000000 +0100
@@ -127,6 +127,15 @@
RETURN;
}
+PP(pp_padgv)
+{
+ dSP; dTARGET;
+ XPUSHs(TARG);
+ if (PL_op->op_private & OPpLVAL_INTRO)
+ SAVECLEARSV(PAD_SVl(PL_op->op_targ));
+ RETURN;
+}
+
PP(pp_padany)
{
DIE(aTHX_ "NOT IMPL LINE %d",__LINE__);
</code>
It builds okay again, so let’s try and do something useful
with it:
<code>
$ ./perl -e 'my *foo = *bar'
Can't modify private glob in scalar assignment at -e line 1, at EOF
</code>
Hmm, another error. Where's it coming from?
<code>
$ fgrep "Can't modify" *.c
op.c: yyerror(Perl_form(aTHX_ "Can't modify %s in %s",
</code>
Aha! This is the default clause in a big switch statement,
in a function called <tt>Perl_mod()</tt>. I guess we need to tell
this about our new op too:
<code>
--- op.c.2 2005-10-18 21:43:38.000000000 +0100
+++ op.c 2005-10-18 21:44:12.000000000 +0100
@@ -1222,6 +1222,7 @@
o->op_private |= OPpMAYBE_LVSUB;
/* FALL THROUGH */
case OP_PADSV:
+ case OP_PADGV:
PL_modcount++;
if (!type) /* local() */
Perl_croak(aTHX_ "Can't localize lexical variable %s",
</code>
Right, now let’s try to use it again:
<code>
$ ./perl -we 'my *foo = *bar; print *foo, "\n"'
Name "main::bar" used only once: possible typo at -e line 1.
*main::bar
</code>
Hot damn! It seems to be working. Let’s try some more experiments:
<code>
$ ./perl -we 'my *foo = *bar; $bar = "Hello, world!\n"; print ${*foo}'
Hello, world!
</code>
No way! This is great. Hmm, I wonder what a new glob is called:
<code>
$ ./perl -we 'my *foo; print *foo{NAME}'
Segmentation fault
</code>
</p><p>
Oh dear, not so good. I guess we need to initialise the new
glob somehow. It looks like we can make a new glob using <tt>newGVgen()</tt>,
but we have to pass a package name, and of course a lexical glob
doesn’t live in a package. Let's use the bogus package “lexical”,
so lexical globs are easy to spot. The other problem is that
<tt>newGVgen()</tt> returns a pointer to the GV, and there doesn't seem to
be any sensible way to copy this GV into the pad entry.
</p><p>
(Confession: I made a false start here. I tried
copying the GV into the pad sv using <tt>sv_setsv()</tt>, but it doesn't
seem to copy all the relevant fields. So then I tried the following.)
This calls for a slight change of strategy. We'll use the pad
entry as a <i>reference</i> to the glob. Like this:
<code>
--- pp.c.1 2005-10-18 22:30:22.000000000 +0100
+++ pp.c 2005-10-19 00:47:33.000000000 +0100
@@ -130,9 +130,15 @@
PP(pp_padgv)
{
dSP; dTARGET;
- XPUSHs(TARG);
- if (PL_op->op_private & OPpLVAL_INTRO)
- SAVECLEARSV(PAD_SVl(PL_op->op_targ));
+ if (PL_op->op_private & OPpLVAL_INTRO) {
+ GV *gen_gv = newGVgen("lexical");
+ sv_upgrade(TARG, SVt_RV);
+ SvREFCNT_inc((SV *)gen_gv);
+ SvRV_set(TARG, (SV *)gen_gv);
+ SvROK_on(TARG);
+ SAVECLEARSV(PAD_SVl(PL_op->op_targ));
+ }
+ XPUSHs(SvRV(TARG));
RETURN;
}
</code>
If you’re wondering what all these oddly-named functions are
doing, have another squint at [doc://perlguts].
</p><p>
(Okay, I admit it! I didn’t get this right first time either. I
forgot the <tt>SvROK_on()</tt>, which caused segfaults during global
destruction (of all the bizarre places), and it took a lengthy
session in the debugger before I figured out what I’d done wrong.)
</p><p>
Anyhow, let’s give it a whirl:
<code>
$ ./perl -wle 'print my *x'
*lexical::_GEN_0
$ ./perl -wle 'my *x = *foo; $foo = "Nice!"; print ${*x}'
Nice!
$ ./perl -wle 'my *x = *foo; my *y = *x; $foo = "Even nicer!"; print ${*y}'
Even nicer!
$ ./perl -wle 'my *x = \23; ${*x} = 24'
Modification of a read-only value attempted at -e line 1.
$ ./perl -wle 'my *x = \23; *foo = *x; print $foo; $foo = 24'
23
Modification of a read-only value attempted at -e line 1.
</code>
</p><p>
Great! Let's try a closure:
<code>
sub foo {
my *foo = shift;
sub {${*foo}}
}
my $x = foo(*foo);
$foo = "Hmm";
print $x->();
my $y = foo(\23);
print $y->();
</code>
That prints “<tt>Hmm23</tt>”, which is cool!
</p><p>
Look at this though:
<code>
$ ./perl -wle 'for(1..10) {my *x} print sort keys %lexical::'
_GEN_0_GEN_1_GEN_2_GEN_3_GEN_4_GEN_5_GEN_6_GEN_7_GEN_8_GEN_9
</code>
Oh dear! Our supposedly lexical globs are being created as real
package variables. That leads to stupendous memory leaks – for
example, the loop:
<code>
$ ./perl -e 'while (1) {my *foo}'
</code>
will keep on growing until the computer runs out of memory.
We'd better do something about that. How about deleting
the entry from the <tt>%lexical::</tt> stash as soon as it's been
created? It's only a one-line addition:
<code>
--- pp.c.3 2005-10-19 01:26:19.000000000 +0100
+++ pp.c 2005-10-19 01:34:42.000000000 +0100
@@ -136,6 +136,7 @@
SvREFCNT_inc((SV *)gen_gv);
SvRV_set(TARG, (SV *)gen_gv);
SvROK_on(TARG);
+ hv_delete(GvSTASH(gen_gv), GvNAME(gen_gv), GvNAMELEN(gen_gv), 0);
SAVECLEARSV(PAD_SVl(PL_op->op_targ));
}
XPUSHs(SvRV(TARG));
</code>
Okay... let’s try it out. The examples above still seem to
work. What about this?
<code>
$ ./perl -wle 'for (1..10) {print \my *foo}'
GLOB(0x81b22d8)
GLOB(0x81b22f8)
GLOB(0x81b22b8)
GLOB(0x81b22d8)
GLOB(0x81b22f8)
GLOB(0x81b22b8)
GLOB(0x81b22d8)
GLOB(0x81b22f8)
GLOB(0x81b22b8)
GLOB(0x81b22d8)
</code>
How cool is that? You can <i>see</i> the memory being reused
– the same addresses keep coming back again and again.
It looks like we have a working Perl interpreter with a
shiny new feature!
</p>
<h2>Testing</h2>
<p>Every time you add a new feature, or fix a bug, you ought to add
some regression tests to make sure that it keeps on working in the
future. Even though we don’t expect this particular patch to be
maintained in the future, it’s good practice. Perl’s test suite
lives in (the subdirectories of) the directory <tt>t</tt>; the
<tt>my</tt> operation is tested in <tt>t/op/my.t</tt>. We need
to decide whether to add our tests to the existing file, or make
a new one. If you look at the <tt>op</tt> tests, you'll notice
that the fundamental ones are all coded by hand – they don’t
use <tt>Test::More</tt>. That's because Test::More is complicated
enough that, if a really fundamental feature gets broken, it will
almost certainly stop working. It might even stop working in such
a way that all the tests appear to have passed, which would be
very bad!
<p></p>
On the other hand, we can be pretty sure that Test::More doesn’t
make use of lexical typeglobs, so there’s no reason we can’t use
it in <i>our</i> tests. That settles it then: we’ll make a new file,
say <tt>t/op/my_glob.t</tt>:
<code>
#!./perl
use Test::More tests => 7;
BEGIN {
# This is in a BEGIN block so that it will run even if
# the syntax causes a compilation error.
eval q{ my *foo };
ok(!$@, "Syntactically okay.");
}
{
my *foo = *bar;
is(*foo, "*main::bar", "Visible within scope,");
}
is(*foo, "*main::foo", "yet invisible without it.");
{
my *foo;
is(*foo{PACKAGE}, 'lexical', "Package is 'lexical'");
ok(!keys %lexical::, "Glob does not really exist.");
}
sub foo {
my *foo = shift;
sub {
return ${*foo};
}
}
my @foo = map foo($_), \(17, 23);
is($foo[0]->(), 17, 'Closure test ($foo[0] == 17)');
is($foo[1]->(), 23, 'Closure test ($foo[1] == 23)');
</code>
It could certainly be more thorough, but this covers
the essentials.
</p><h2>Tidying up</h2>
<p>If you run ‘<tt>make test</tt>’ at this point, there are a handful of
test failures. That's not a real problem – they all come, in
one way or another, from the fact that we've added a new opcode.
The Opcode module whines that it doesn’t know about this crazy
padgv thingy, but it’s easy to make it happy:
<code>
--- ext/Opcode/Opcode.pm.orig 2005-10-19 01:08:11.000000000 +0100
+++ ext/Opcode/Opcode.pm 2005-10-19 01:08:28.000000000 +0100
@@ -394,7 +394,7 @@
gvsv gv gelem
- padsv padav padhv padany
+ padsv padav padhv padgv padany
rv2gv refgen srefgen ref
</code>
</p><p>
In similar vein, the test for Safe uses a list of tests that has
to match up with the list of ops.
<code>
--- ext/Safe/t/safeops.t.orig 2005-10-19 02:13:20.000000000 +0100
+++ ext/Safe/t/safeops.t 2005-10-19 02:13:52.000000000 +0100
@@ -19,7 +19,7 @@
}
use strict;
-use Test::More tests => 354;
+use Test::More tests => 355;
use Safe;
# Read the op names and descriptions directly from opcode.pl
@@ -72,6 +72,7 @@
SKIP my $x # padsv
SKIP my @x # padav
SKIP my %x # padhv
+SKIP my *x # padgv
SKIP (not implemented) # padany
SKIP split /foo/ # pushre
*x # rv2gv
</code>
One of the tests for B::Concise fails because it's looking for a
specific opcode, which we've changed by inserting a new op into
the middle of the list. That's easy to fix too:
<code>
--- ext/B/t/optree_concise.t.orig 2005-10-19 02:16:20.000000000 +0100
+++ ext/B/t/optree_concise.t 2005-10-19 02:16:37.000000000 +0100
@@ -183,13 +183,13 @@
UNOP (0x82b0918) leavesub [1]
LISTOP (0x82b08d8) lineseq
COP (0x82b0880) nextstate
- UNOP (0x82b0860) null [15]
+ UNOP (0x82b0860) null [16]
PADOP (0x82b0840) gvsv GV (0x82a818c) *a
EOT_EOT
# UNOP (0x8282310) leavesub [1]
# LISTOP (0x82822f0) lineseq
# COP (0x82822b8) nextstate
-# UNOP (0x812fc20) null [15]
+# UNOP (0x812fc20) null [16]
# SVOP (0x812fc00) gvsv GV (0x814692c) *a
EONT_EONT
</code>
Now the tests all pass!
</p><p>
There’s one more thing. If you’re making a non-standard change
to perl, like we've just done, you're supposed to register it as
a <i>local patch</i>. You do that by adding a line to
<tt>patchlevel.h</tt>, like so:
<code>
--- patchlevel.h.orig 2005-10-19 03:31:04.000000000 +0100
+++ patchlevel.h 2005-10-19 03:32:24.000000000 +0100
@@ -122,6 +122,7 @@
static const char * const local_patches[] = {
NULL
,"DEVEL" STRINGIFY(PERL_PATCHNUM)
+ ,"LEXGLOB001 - lexically scoped globs!"
,NULL
};
</code>
</p><p>
Now when we run <tt>./perl -V</tt>, we get:
<code>
...
Locally applied patches:
DEVEL25746
LEXGLOB001 - lexically scoped globs!
...
</code>
The whole patch is [http://www.cs.man.ac.uk/~houstorx/lexical-typeglobs.patch|here]. It only changes 38 lines of code.
</p>
</readmore>