Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Hacking perl

by robin (Chaplain)
on Oct 20, 2005 at 23:29 UTC ( #501844=perlmeditation: print w/replies, xml ) Need Help??

So, you know everything there is to know about Perl. You stifle a yawn as you flick through the latest obfuscations – how obvious they are! It's gratifying to be an expert; and your patient, only slightly condescending, help is appreciated by the less experienced. Life is good. But sometimes, in a reflective moment, you miss your younger days as an intrepid explorer in unfamiliar terrain, where there was always a new mystery to unravel, a new landscape to be discovered.

You could go and learn ruby or something, but that feels vaguely disloyal after so many years nestled happily in the bosom of Perl. And besides, you don't really like the idea of being patronised by experts on rubymonks or whatever the hell they have.

This is the node you've been waiting for. There is a way! You can plunge into an unfamiliar world of mystery and at the same time enhance your reputation as a Perl guru. I'm talking about perl: the guts, the source, the motherlode.

Update: Fixed a few typos; added a ďTestingĒ section, as suggested by hossman.

You've probably heard the stories, of strong men who went there and never returned, or returned mumbling and broken and wonít tell what they saw. One explorer survived long enough to describe ďan interconnected mass of livers and pancreas and lungs and little sharp pointy things and the occasional exploding kidneyĒ.

Donít worry about it. Youíll be fine. The mistake that people often make is they try to understand it. (Some people make the same mistake with life, or the state of the world, or The Prisoner.) If you can avoid that, you'll be all right. Just crack it open and get stuck in. The slogan of the hour is ďHACK FIRST, THINK LATERĒ.

The trick is just to mess with things. Donít waste too much time worrying about what's going to happen: try it and see what happens.

People often say that a good way to get started is to try and fix a bug that somebodyís reported. Thereís some truth in that, but debugging isnít most peopleís idea of a good time. So that's not what we'll do; instead weíll add a new feature: lexical typeglobs. Itís always bothered me a little that you canít say my *foo. Why canít you? Thereís no good reason. Itís not very useful, I admit, but it would be kind of cool.

I did it using the HACK FIRST methodology, and wrote down what I was doing as I went along. I was really surprised at how easy it turned out to be: the final patch only changes 38 lines of code.

If youíre the kind of person who always reads the guidebook before going to a new place, you might like to glance at these: perlhack, perlguts. Itís not compulsory though.

The first thing to do is to get a copy of the source. You can hack on whichever version you like, but I decided to use the latest ďbleadperlĒ. If you want to follow along, you should get it too. This is an interactive tutorial: itís not designed for reading in the bath, and it probably wonít make so much sense if youíre not actually tinkering with the source while you read.

So find a disk with a reasonable amount of free space, make a directory for the source to go in, and grab it. I did it like this:

cd build mkdir perl-current cd perl-current rsync -avz rsync:// .
and you should do something similar. Now build it, to make sure itís working before you start.
sh Configure -Doptimize='-g' -Dusethreads -Dusedevel -Dprefix=/local/p +erl -ders && make

You can vary this as you like, but the -Doptimize='-g' -Dusedevel is essential. The -Dusedevel tells it that, yes, you really want to build a development version; and -Doptimize='-g' turns on debugging mode, which weíre going to make good use of later on.

In case you havenít looked at perlhack, Iíll quickly explain the rough structure of the source. Perl code is tokenised by a rather hairy routine called yylex that lives in toke.c, then itís parsed using the bison grammar that lives in perly.y. The grammar uses the routines in op.c to build an optree.

The optree is then executed by a one-liner that lives in run.c, which dispatches each op to the appropriate routine. The ops themselves are implemented by functions in the files pp*.c.

Back to the problem at hand! Our first task is to persuade perl to recognise the new construct. A quick

perl -e 'my *x'
syntax error at -e line 1, near "my *x"
which shows that the parser doesnít even recognise the syntax. So we crack open the grammar (in perly.y, remember), and start grepping for Ďmyí. Soon we find this:
/* "my" declarations, with optional attributes */ myattrterm: MY myterm myattrlist { $$ = my_attrs($2,$3); } | MY myterm { $$ = localize($2,$1); } ; /* Things that can be "my"'d */ myterm : '(' expr ')' { $$ = sawparens($2); } | '(' ')' { $$ = sawparens(newNULLLIST()); } | scalar %prec '(' { $$ = $1; } | hsh %prec '(' { $$ = $1; } | ary %prec '(' { $$ = $1; } ;
and you don't need a degree in rocket science to see that this is the bit weíre interested in. Down at the end of the file, the symbols Ďscalarí, Ďhshí and Ďaryí are defined, like this:
scalar : '$' indirob { $$ = newSVREF($2); } ; ary : '@' indirob { $$ = newAVREF($2); } ; hsh : '%' indirob { $$ = newHVREF($2); } ;
and thereís a similar entry for globs, though it seems to be called ďstarĒ rather than ďglobĒ. (The perl source is full of little things that donít quite make sense – thatís part of its charm.) Here it is:
star : '*' indirob { $$ = newGVREF(0,$2); } ;

So letís add another clause to the definition of Ďmytermí, like this:

| star %prec '(' { $$ = $1; }
Time to check it out! First we have to rebuild the parser using the new grammar, then rebuild perl itself:
perl make perl
Try the one-liner again:
$ ./perl -e 'my *x' Can't declare ref-to-glob cast in "my" at -e line 1, at EOF
Great! Itís been parsed okay, and itís now being rejected during compilation. The compiler is housed in op.c, and weíll need to write some code to compile our new construct. But before we can do that, we need to decide what weíre going to compile it to. Letís have a quick peek at what perl does with other Ďmyí declarations:
$ ./perl -MO=Concise -e 'my $x; my @y; my %z' 8 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 3 <0> padsv[$x:1,4] vM/LVINTRO ->4 4 <;> nextstate(main 2 -e:1) v ->5 5 <0> padav[@y:2,4] vM/LVINTRO ->6 6 <;> nextstate(main 3 -e:1) v ->7 7 <0> padhv[%z:3,4] vM/LVINTRO ->8 -e syntax OK
Okay, so they get compiled to special ops called /pad.v/. (In case you donít know, thereís a special perl guts shorthand for different types of value. The most important ones are: a scalar is an SV, an array is an AV, a hash is an HV and a glob is a GV. Oh yeah, and a reference is an RV. Pretty simple really.)

Looks like we ought to make a padgv op! The ops are all defined in a file called, which auto-generates the relevant header files. If we were worried about backwards compatibility, weíd add the new op at the end; but this is just for fun so weíre not really fussed about compatibility, and weíll add it at the logical place in the file:

$ diff -u{.orig,} --- 2005-10-18 19:07:24.000000000 +0100 +++ 2005-10-18 19:07:49.000000000 +0100 @@ -491,6 +491,7 @@ padsv private variable ck_null ds0 padav private array ck_null d0 padhv private hash ck_null d0 +padgv private glob ck_null d0 padany private value ck_null d0 pushre push regexp ck_null d/
Now run, which updates opcode.h, opnames.h and pp_proto.h on your behalf.

Weíve got a new op, but the compiler isnít going to use it unless we tell it how. So crack open op.c, and squint at the Perl_newGVREF() function. Looking at Perl_new[SAH]VREF for comparison, itís fairly obvious what we have to do:

--- op.c.before 2005-10-18 19:13:33.000000000 +0100 +++ op.c 2005-10-18 19:33:09.000000000 +0100 @@ -4902,7 +4902,13 @@ OP * Perl_newGVREF(pTHX_ I32 type, OP *o) { - if (type == OP_MAPSTART || type == OP_GREPSTART || type == OP_SOR +T) + dVAR; + if (o->op_type == OP_PADANY) { + o->op_type = OP_PADGV; + o->op_ppaddr = PL_ppaddr[OP_PADGV]; + return o; + } + else if (type == OP_MAPSTART || type == OP_GREPSTART || type == O +P_SORT) return newUNOP(OP_NULL, 0, o); return ref(newUNOP(OP_RV2GV, OPf_REF, o), type); }

Weíre going to need to implement the new op at some point, but for now letís just whack in a placeholder:

--- pp.c.orig 2005-10-18 19:16:07.000000000 +0100 +++ pp.c 2005-10-18 19:17:51.000000000 +0100 @@ -127,6 +127,11 @@ RETURN; } +PP(pp_padgv) +{ + DIE(aTHX_ "OP_PADGV NOT YET IMPLEMENTED"); +} + PP(pp_padany) { DIE(aTHX_ "NOT IMPL LINE %d",__LINE__);

Wahey! Letís check out what weíve got so far. Run Ďmakeí to rebuild it, then:

$ ./perl -MO=Concise -e 'my *x' Can't declare ref-to-glob cast in "my" at -e line 1, at EOF -e had compilation errors. 5 <@> leave[1 ref] KP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) ->3 4 <1> rv2gv sKR/1 ->5 3 <#> gv[*x] s ->4

Oh dear. :-(

The error is to be expected – we still havenít looked into that – but we have OP_RV2GV instead of our shiny new OP_PADGV. Whatís that about? Time to wheel out the old debugger.

$ gdb --args ./perl -e 'my *x' ... (gdb) br Perl_newGVREF Breakpoint 1 at 0x80a51e2: file op.c, line 4905. (gdb) run Starting program: /local/build/perl-current/perl -e my\ \*x ... Breakpoint 1, Perl_newGVREF (my_perl=0x81b07c0, type=0, o=0x81ccc50) at op.c:4905 4905 if (o->op_type == OP_PADANY) { (gdb) p *o $1 = {op_next = 0x81ccc50, op_sibling = 0x0, op_ppaddr = 0x80e2c0c <Perl_pp_const>, op_targ = 0, op_type = 5, op_ +opt = 0, op_static = 0, op_spare = 0, op_flags = 2 '\002', op_private = 16 '\ +020'}
Hmm, so weíve got an OP_CONST instead of the OP_PADANY we were expecting. Itís time to find out where those PADANYs are coming from:
$ grep OP_PADANY *.c ... toke.c: yylval.opval = newOP(OP_PADANY, 0); toke.c: yylval.opval = newOP(OP_PADANY, 0);

Ah! Itís tokeniser magic. (You didnít think the tokeniser just tokenised, did you? Oh no.) That means itís time to dive into toke.c and see if we can grok whatís happening there. These newOP() commands are both in a function called S_pending_indent(), which gets called right from the top of the main lexer routine Perl_yylex():

... if (PL_pending_ident) return REPORT(S_pending_ident(aTHX)); ...
That means that PL_pending_ident must be getting set for $foo, @foo and %foo, but not for *foo. A quick grep through the file reveals that weíre quite right – when a Ď*í is encountered, something called force_ident() gets called instead. Letís try changing it:
--- toke.c.orig 2005-10-18 19:47:57.000000000 +0100 +++ toke.c 2005-10-18 19:48:43.000000000 +0100 @@ -3169,7 +3169,7 @@ if (PL_expect != XOPERATOR) { s = scan_ident(s, PL_bufend, PL_tokenbuf, sizeof PL_tokenb +uf, TRUE); PL_expect = XOPERATOR; - force_ident(PL_tokenbuf, '*'); + PL_pending_ident = '*'; if (!*PL_tokenbuf) PREREF('*'); TERM('*');

We try rebuilding perl. It builds miniperl okay, but then it dies with a load of syntax errors during the build process. The most telling-looking one is the second one:

syntax error at ../lib/ line 29, near "*$sym "

Hmm, letís see what the tokeniser is up to:

$ ./miniperl -DT -ce '*$x' ### 0:LEX_NORMAL/XSTATE "\n;" ### <== '*' ### 1:LEX_NORMAL/XREF "$x\n" ### Pending identifier '' ### <== WORD(opval=op_const) PV(""\0) ### 1:LEX_NORMAL/XREF "$x\n" ### <== '$' ### 1:LEX_NORMAL/XOPERATOR ";" ### Pending identifier '$x' ### <== WORD(opval=op_const) PV("x"\0) ### 1:LEX_NORMAL/XOPERATOR ";" ### <== ';' ### 1:LEX_NORMAL/XSTATE "" ### Tokener got EOF ### <== EOF syntax error at -e line 1, next char $ -e had compilation errors.
I canít tell whatís wrong just looking at that, so letís try comparing it to something similar that does work:
$ ./miniperl -DT -ce '%$x' ### 0:LEX_NORMAL/XSTATE "\n;" ### <== '%' ### 1:LEX_NORMAL/XREF "$x\n" ### <== '$' ### 1:LEX_NORMAL/XOPERATOR ";" ### Pending identifier '$x' ### <== WORD(opval=op_const) PV("x"\0) ### 1:LEX_NORMAL/XOPERATOR ";" ### <== ';' ### 1:LEX_NORMAL/XSTATE "" ### Tokener got EOF ### <== EOF EXECUTING... -e syntax OK
Thatís interesting! Thereís definitely something different there. The second block of the Ď*í run doesnít appear at all in this one.

Maybe we got something wrong in the tokeniser change? Sure enough, another look at toke.c shows that the code for Ď%í is setting PL_tokenbuf[0] = '%', which we werenít doing. So letís try a slightly improved change, copying the structure of the Ď%í code a bit more faithfully:

--- toke.c.orig 2005-10-18 19:47:57.000000000 +0100 +++ toke.c 2005-10-18 20:16:52.000000000 +0100 @@ -3167,11 +3167,11 @@ case '*': if (PL_expect != XOPERATOR) { - s = scan_ident(s, PL_bufend, PL_tokenbuf, sizeof PL_tokenb +uf, TRUE); - PL_expect = XOPERATOR; - force_ident(PL_tokenbuf, '*'); - if (!*PL_tokenbuf) + PL_tokenbuf[0] = '*'; + s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof PL_to +kenbuf - 1, TRUE); + if (!PL_tokenbuf[1]) PREREF('*'); + PL_pending_ident = '*'; TERM('*'); } s++;
Right, letís try another make. This one fails too, but in a much more interesting way:
Global symbol "%DELETE" requires explicit package name at lib/Config_h line 1158
Looking at line 1158 of, we find:
It looks like "use strict" is now affecting globs! Indeed:
$ ./miniperl -ce '*foo' -e syntax OK $ ./miniperl -ce 'use strict; *foo' Global symbol "%foo" requires explicit package name at -e line 1. -e had compilation errors.
Also, it seems to be confusing globs with hashes. What's this "%foo" all about? Let's see where the error is coming from:
$ fgrep 'explicit package name' *.c gv.c: "Global symbol \"%s%s\" requires explicit package nam +e",
A quick look in gv.c tells us that it's coming from Perl_gv_fetchpvn_flags(). Who's calling that? It's debugger time again!
$ gdb --args ./miniperl -e '*foo' (gdb) br Perl_gv_fetchpvn_flags Breakpoint 1 at 0x806c485: file gv.c, line 710. (gdb) condition 1 nambeg[0] == 'f' (gdb) run Starting program: /local/build/perl-current/miniperl -e \*foo [New Thread 1074334048 (LWP 2919)] [Switching to Thread 1074334048 (LWP 2919)] Breakpoint 1, Perl_gv_fetchpvn_flags (my_perl=0x8199720, nambeg=0x8199ca9 "foo", full_len=3, flags=1, sv_type=12) at gv.c:7 +10 710 register const char *name = nambeg; (gdb) bt #0 Perl_gv_fetchpvn_flags (my_perl=0x8199720, nambeg=0x8199ca9 "foo", + full_len=3, flags=1, sv_type=12) at gv.c:710 #1 0x0806c406 in Perl_gv_fetchpv (my_perl=0x8199720, nambeg=0x8199ca9 + "foo", add=1, sv_type=12) at gv.c:696 #2 0x08087238 in S_pending_ident (my_perl=0x8199720) at toke.c:5669 #3 0x08076115 in Perl_yylex (my_perl=0x8199720) at toke.c:2431 #4 0x08092b5b in Perl_yyparse (my_perl=0x8199720) at perly.c:412 #5 0x08063ebc in S_parse_body (my_perl=0x8199720, env=0x0, xsinit=0x804bb65 <xs_init>) at perl.c:2136 #6 0x08062d16 in perl_parse (my_perl=0x8199720, xsinit=0x804bb65 <xs_ +init>, argc=3, argv=0xbffff3f4, env=0x0) at perl.c:1542 #7 0x0804baeb in main (argc=3, argv=0xbffff3f4, env=0xbffff404) at miniperlmain.c:101 #8 0x42015704 in __libc_start_main () from /lib/tls/

Aha! So S_pending_ident() is calling gv_fetchpv(). There it is, right at the end, like this:

gv_fetchpv(PL_tokenbuf+1, PL_in_eval ? (GV_ADDMULTI | GV_ADDINEVAL +) : TRUE, ((PL_tokenbuf[0] == '$') ? SVt_PV : (PL_tokenbuf[0] == '@') ? SVt_PVAV : SVt_PVHV));

Ah! It's assuming that anything that's not a scalar or an array must be a hash. But we've added a new possibility, so letís tell it about that:

--- toke.c.1 2005-10-18 20:53:32.000000000 +0100 +++ toke.c 2005-10-18 20:54:34.000000000 +0100 @@ -5666,6 +5666,7 @@ gv_fetchpv(PL_tokenbuf+1, PL_in_eval ? (GV_ADDMULTI | GV_ADDINEVA +L) : TRUE, ((PL_tokenbuf[0] == '$') ? SVt_PV : (PL_tokenbuf[0] == '@') ? SVt_PVAV + : (PL_tokenbuf[0] == '*') ? SVt_PVGV : SVt_PVHV)); return WORD; }

Now we try another Ďmake perlí, and everything builds as normal. Phew! Even better, the new op is being used in the right place:

$ ./perl -MO=Concise -e 'my *foo' Can't declare private glob in "my" at -e line 1, at EOF -e had compilation errors. 4 <@> leave[1 ref] KP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) ->3 3 <0> padgv[*foo:1,2] ->4
Still got that pesky error though... I wonder where that one's coming from:
$ fgrep "Can't declare" *.c op.c: "Can't declare class for non-scalar %s in \"%s\"", op.c: yyerror(Perl_form(aTHX_ "Can't declare %s in %s", op.c: yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"", opmini.c: "Can't declare class for non-scalar %s in +\"%s\"", opmini.c: yyerror(Perl_form(aTHX_ "Can't declare %s in %s", opmini.c: yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"",

opmini.c is just an autogenerated copy of op.c thatís used to build miniperl. I donít know why thereís a separate file for this – probably those hysterical raisins again. Anyway, this looks like the right bit, here:

else if (type != OP_PADSV && type != OP_PADAV && type != OP_PADHV && type != OP_PUSHMARK) { yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"", OP_DESC(o), PL_in_my == KEY_our ? "our" : "my")); return o; }
I guess we need to tell it about our new op.
--- op.c.1 2005-10-18 21:03:51.000000000 +0100 +++ op.c 2005-10-18 21:04:11.000000000 +0100 @@ -1704,6 +1704,7 @@ else if (type != OP_PADSV && type != OP_PADAV && type != OP_PADHV && + type != OP_PADGV && type != OP_PUSHMARK) { yyerror(Perl_form(aTHX_ "Can't declare %s in \"%s\"",
Rebuild once again, and:
$ ./perl -e 'my *foo' OP_PADGV NOT YET IMPLEMENTED at -e line 1.
Itís executing our new op! Now weíre talking! I guess that means that we ought to implement the thing. We donít really know what weíre doing here, so letís just try something really simple, a kind of ultra stripped-down version of pp_padsv, and see what happens.
--- pp.c.orig 2005-10-18 19:16:07.000000000 +0100 +++ pp.c 2005-10-18 22:23:28.000000000 +0100 @@ -127,6 +127,15 @@ RETURN; } +PP(pp_padgv) +{ + dSP; dTARGET; + XPUSHs(TARG); + if (PL_op->op_private & OPpLVAL_INTRO) + SAVECLEARSV(PAD_SVl(PL_op->op_targ)); + RETURN; +} + PP(pp_padany) { DIE(aTHX_ "NOT IMPL LINE %d",__LINE__);
It builds okay again, so letís try and do something useful with it:
$ ./perl -e 'my *foo = *bar' Can't modify private glob in scalar assignment at -e line 1, at EOF
Hmm, another error. Where's it coming from?
$ fgrep "Can't modify" *.c op.c: yyerror(Perl_form(aTHX_ "Can't modify %s in %s",
Aha! This is the default clause in a big switch statement, in a function called Perl_mod(). I guess we need to tell this about our new op too:
--- op.c.2 2005-10-18 21:43:38.000000000 +0100 +++ op.c 2005-10-18 21:44:12.000000000 +0100 @@ -1222,6 +1222,7 @@ o->op_private |= OPpMAYBE_LVSUB; /* FALL THROUGH */ case OP_PADSV: + case OP_PADGV: PL_modcount++; if (!type) /* local() */ Perl_croak(aTHX_ "Can't localize lexical variable %s",
Right, now letís try to use it again:
$ ./perl -we 'my *foo = *bar; print *foo, "\n"' Name "main::bar" used only once: possible typo at -e line 1. *main::bar
Hot damn! It seems to be working. Letís try some more experiments:
$ ./perl -we 'my *foo = *bar; $bar = "Hello, world!\n"; print ${*foo}' Hello, world!
No way! This is great. Hmm, I wonder what a new glob is called:
$ ./perl -we 'my *foo; print *foo{NAME}' Segmentation fault

Oh dear, not so good. I guess we need to initialise the new glob somehow. It looks like we can make a new glob using newGVgen(), but we have to pass a package name, and of course a lexical glob doesnít live in a package. Let's use the bogus package ďlexicalĒ, so lexical globs are easy to spot. The other problem is that newGVgen() returns a pointer to the GV, and there doesn't seem to be any sensible way to copy this GV into the pad entry.

(Confession: I made a false start here. I tried copying the GV into the pad sv using sv_setsv(), but it doesn't seem to copy all the relevant fields. So then I tried the following.) This calls for a slight change of strategy. We'll use the pad entry as a reference to the glob. Like this:

--- pp.c.1 2005-10-18 22:30:22.000000000 +0100 +++ pp.c 2005-10-19 00:47:33.000000000 +0100 @@ -130,9 +130,15 @@ PP(pp_padgv) { dSP; dTARGET; - XPUSHs(TARG); - if (PL_op->op_private & OPpLVAL_INTRO) - SAVECLEARSV(PAD_SVl(PL_op->op_targ)); + if (PL_op->op_private & OPpLVAL_INTRO) { + GV *gen_gv = newGVgen("lexical"); + sv_upgrade(TARG, SVt_RV); + SvREFCNT_inc((SV *)gen_gv); + SvRV_set(TARG, (SV *)gen_gv); + SvROK_on(TARG); + SAVECLEARSV(PAD_SVl(PL_op->op_targ)); + } + XPUSHs(SvRV(TARG)); RETURN; }
If youíre wondering what all these oddly-named functions are doing, have another squint at perlguts.

(Okay, I admit it! I didnít get this right first time either. I forgot the SvROK_on(), which caused segfaults during global destruction (of all the bizarre places), and it took a lengthy session in the debugger before I figured out what Iíd done wrong.)

Anyhow, letís give it a whirl:

$ ./perl -wle 'print my *x' *lexical::_GEN_0 $ ./perl -wle 'my *x = *foo; $foo = "Nice!"; print ${*x}' Nice! $ ./perl -wle 'my *x = *foo; my *y = *x; $foo = "Even nicer!"; print $ +{*y}' Even nicer! $ ./perl -wle 'my *x = \23; ${*x} = 24' Modification of a read-only value attempted at -e line 1. $ ./perl -wle 'my *x = \23; *foo = *x; print $foo; $foo = 24' 23 Modification of a read-only value attempted at -e line 1.

Great! Let's try a closure:

sub foo { my *foo = shift; sub {${*foo}} } my $x = foo(*foo); $foo = "Hmm"; print $x->(); my $y = foo(\23); print $y->();
That prints ďHmm23Ē, which is cool!

Look at this though:

$ ./perl -wle 'for(1..10) {my *x} print sort keys %lexical::' _GEN_0_GEN_1_GEN_2_GEN_3_GEN_4_GEN_5_GEN_6_GEN_7_GEN_8_GEN_9
Oh dear! Our supposedly lexical globs are being created as real package variables. That leads to stupendous memory leaks – for example, the loop:
$ ./perl -e 'while (1) {my *foo}'
will keep on growing until the computer runs out of memory. We'd better do something about that. How about deleting the entry from the %lexical:: stash as soon as it's been created? It's only a one-line addition:
--- pp.c.3 2005-10-19 01:26:19.000000000 +0100 +++ pp.c 2005-10-19 01:34:42.000000000 +0100 @@ -136,6 +136,7 @@ SvREFCNT_inc((SV *)gen_gv); SvRV_set(TARG, (SV *)gen_gv); SvROK_on(TARG); + hv_delete(GvSTASH(gen_gv), GvNAME(gen_gv), GvNAMELEN(gen_gv), +0); SAVECLEARSV(PAD_SVl(PL_op->op_targ)); } XPUSHs(SvRV(TARG));
Okay... letís try it out. The examples above still seem to work. What about this?
$ ./perl -wle 'for (1..10) {print \my *foo}' GLOB(0x81b22d8) GLOB(0x81b22f8) GLOB(0x81b22b8) GLOB(0x81b22d8) GLOB(0x81b22f8) GLOB(0x81b22b8) GLOB(0x81b22d8) GLOB(0x81b22f8) GLOB(0x81b22b8) GLOB(0x81b22d8)
How cool is that? You can see the memory being reused – the same addresses keep coming back again and again. It looks like we have a working Perl interpreter with a shiny new feature!


Every time you add a new feature, or fix a bug, you ought to add some regression tests to make sure that it keeps on working in the future. Even though we donít expect this particular patch to be maintained in the future, itís good practice. Perlís test suite lives in (the subdirectories of) the directory t; the my operation is tested in t/op/my.t. We need to decide whether to add our tests to the existing file, or make a new one. If you look at the op tests, you'll notice that the fundamental ones are all coded by hand – they donít use Test::More. That's because Test::More is complicated enough that, if a really fundamental feature gets broken, it will almost certainly stop working. It might even stop working in such a way that all the tests appear to have passed, which would be very bad!

On the other hand, we can be pretty sure that Test::More doesnít make use of lexical typeglobs, so thereís no reason we canít use it in our tests. That settles it then: weíll make a new file, say t/op/my_glob.t:
#!./perl use Test::More tests => 7; BEGIN { # This is in a BEGIN block so that it will run even if # the syntax causes a compilation error. eval q{ my *foo }; ok(!$@, "Syntactically okay."); } { my *foo = *bar; is(*foo, "*main::bar", "Visible within scope,"); } is(*foo, "*main::foo", "yet invisible without it."); { my *foo; is(*foo{PACKAGE}, 'lexical', "Package is 'lexical'"); ok(!keys %lexical::, "Glob does not really exist."); } sub foo { my *foo = shift; sub { return ${*foo}; } } my @foo = map foo($_), \(17, 23); is($foo[0]->(), 17, 'Closure test ($foo[0] == 17)'); is($foo[1]->(), 23, 'Closure test ($foo[1] == 23)');
It could certainly be more thorough, but this covers the essentials.

Tidying up

If you run Ďmake testí at this point, there are a handful of test failures. That's not a real problem – they all come, in one way or another, from the fact that we've added a new opcode. The Opcode module whines that it doesnít know about this crazy padgv thingy, but itís easy to make it happy:

--- ext/Opcode/ 2005-10-19 01:08:11.000000000 +0100 +++ ext/Opcode/ 2005-10-19 01:08:28.000000000 +0100 @@ -394,7 +394,7 @@ gvsv gv gelem - padsv padav padhv padany + padsv padav padhv padgv padany rv2gv refgen srefgen ref

In similar vein, the test for Safe uses a list of tests that has to match up with the list of ops.

--- ext/Safe/t/safeops.t.orig 2005-10-19 02:13:20.000000000 +0100 +++ ext/Safe/t/safeops.t 2005-10-19 02:13:52.000000000 +0100 @@ -19,7 +19,7 @@ } use strict; -use Test::More tests => 354; +use Test::More tests => 355; use Safe; # Read the op names and descriptions directly from @@ -72,6 +72,7 @@ SKIP my $x # padsv SKIP my @x # padav SKIP my %x # padhv +SKIP my *x # padgv SKIP (not implemented) # padany SKIP split /foo/ # pushre *x # rv2gv
One of the tests for B::Concise fails because it's looking for a specific opcode, which we've changed by inserting a new op into the middle of the list. That's easy to fix too:
--- ext/B/t/optree_concise.t.orig 2005-10-19 02:16:20.000000000 ++0100 +++ ext/B/t/optree_concise.t 2005-10-19 02:16:37.000000000 +0100 @@ -183,13 +183,13 @@ UNOP (0x82b0918) leavesub [1] LISTOP (0x82b08d8) lineseq COP (0x82b0880) nextstate - UNOP (0x82b0860) null [15] + UNOP (0x82b0860) null [16] PADOP (0x82b0840) gvsv GV (0x82a818c) *a EOT_EOT # UNOP (0x8282310) leavesub [1] # LISTOP (0x82822f0) lineseq # COP (0x82822b8) nextstate -# UNOP (0x812fc20) null [15] +# UNOP (0x812fc20) null [16] # SVOP (0x812fc00) gvsv GV (0x814692c) *a EONT_EONT
Now the tests all pass!

Thereís one more thing. If youíre making a non-standard change to perl, like we've just done, you're supposed to register it as a local patch. You do that by adding a line to patchlevel.h, like so:

--- patchlevel.h.orig 2005-10-19 03:31:04.000000000 +0100 +++ patchlevel.h 2005-10-19 03:32:24.000000000 +0100 @@ -122,6 +122,7 @@ static const char * const local_patches[] = { NULL ,"DEVEL" STRINGIFY(PERL_PATCHNUM) + ,"LEXGLOB001 - lexically scoped globs!" ,NULL };

Now when we run ./perl -V, we get:

... Locally applied patches: DEVEL25746 LEXGLOB001 - lexically scoped globs! ...
The whole patch is here. It only changes 38 lines of code.

Replies are listed 'Best First'.
Re: Hacking perl
by hossman (Prior) on Oct 21, 2005 at 00:03 UTC

    Superb write up.

    I feel sadend however, by one thing that seems to be missing, but in my opinion is absolutely critical to any patch to any OSS Code Base -- test cases.

    Even if you don't believe in Test driven development, even if you don't worry about having 100% code coverage in your test suite, the best way to explain what a patch does, is to provide some tests that demonstrates it's usage.

      A very good point, thanks. (Certainly a patch would never be accepted into the core unless accompanied by tests.) I'll update my node to include some tests.

      Update: I've updated it. Now with tests!

Re: Hacking perl
by Ovid (Cardinal) on Oct 20, 2005 at 23:54 UTC

    This is a beautiful node. Thank you! I have one minor comment (emphasis mine):

    The mistake that people often make is they try to understand it. (Some people make the same mistake with life, or the state of the world, or The Prisoner.)

    Even if you never understand The Prisoner, like this node, it has lessons worth contemplating.

    Be seeing you,
    Number 17,000

    Interesting timing, too. I have The Prisoner on tape (I'm so old school) and Schwern and I have been watching and arguing about it the past few days.

    New address of my CGI Course.

Re: Hacking perl
by BrowserUk (Patriarch) on Oct 21, 2005 at 00:01 UTC

    Had the idea, got some confirmation that it was a good one, wrote the patch and got the rejection note!

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

      That's a damned shame. I think "say" would be very handy to have. Did they really reject it because Perl's getting old?


      New address of my CGI Course.

Re: Hacking perl
by japhy (Canon) on Oct 21, 2005 at 15:08 UTC
    <voice who="baseball announcer">Give that kid a contract!<voice>

    For the unwashed masses, when a kid catches a ball that goes into the stands, often the ballpark announcer will say something like "Good catch! Someone give that kid a contract!" I know full well who robin is. I just like saying that.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Hacking perl
by blazar (Canon) on Oct 21, 2005 at 10:07 UTC
    So, you know everything there is to know about Perl.
    Huh?!? Do I? Jokes apart, thank you for this superb node. But why didn't you make this into a tutorial, rather than a meditation?

      I did think about it. Two things persuaded me to post it as a meditation:

      1. The blurb for the tutorials section suggests that a would-be tutorial should initially be posted as a meditation.
      2. It doesn't fit any of the tutorial categories. It would even look out of place in other topics, I think.

        I think the Miscellaneous Perl Tutorials section might be appropriate, at least until such time as a "Perl Internals" section gets made. robin++

        We're building the house of the future together.
Re: Hacking perl
by liverpole (Monsignor) on Oct 21, 2005 at 12:13 UTC
    An excellent article -- thank you for this.  Very informative!
Re: Hacking perl
by shmem (Chancellor) on Jan 15, 2007 at 11:42 UTC

    Definitely a nice hack, robin++ ... but it leaves a question open. Consider:

    local *foo:

    *foo = [qw (a b c)]; { local *foo = [qw(1 2 3)]; print "local \@{\*foo{ARRAY}} = qw(@{*foo{ARRAY}})\n"; print "local \@foo = qw(@foo)\n"; } print "global \@foo = qw(@foo)\n"; __END__ local @{*foo{ARRAY}} = qw(1 2 3) local @foo = qw(1 2 3) global @foo = qw(a b c)

    my *foo:

    *foo = [qw (a b c)]; { my *foo = [qw(1 2 3)]; print "my \@{\*foo{ARRAY}} = qw(@{*foo{ARRAY}})\n"; print "my \@foo = qw(@foo)\n"; } print "global \@foo = qw(@foo)\n"; __END__ my @{*foo{ARRAY}} = qw(1 2 3) my @foo = qw(a b c) global @foo = qw(a b c)

    Hmm. The lexical glob is created ok, but other than with local, glob corresponding variables aren't treated as aliases to the typeglob entries in the current scope.

    Now, should that happen? I think, it should; but such lookup should also warn about "masking an earlier @foo in the same scope".

    What do you think?


    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Yes, I'm inclined to agree.

      I didn't try and do that for this tutorial, because I thought I'd already done enough for demonstration purposes. (Also I'd been hacking all night and it was starting to get light.)

      It would be a fun exercise for someone to figure out how to do what you suggest, and a good way to learn about perl internals. Is anyone up for a challenge?

        Certainly I'm already at it... ;-)


        update: ;) others might be faster, though...

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://501844]
Approved by GrandFather
Front-paged by itub
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2023-09-30 07:13 GMT
Find Nodes?
    Voting Booth?

    No recent polls found