Current Perl documentation can be found at perldoc.perl.org.
Here is our local, out-dated (pre-5.6) version:
The most recent development releases of Perl has been experimenting with removing Perl's dependency on the ``normal'' standard I/O suite and allowing other stdio implementations to be used. This involves creating a new abstraction layer that then calls whichever implementation of stdio Perl was compiled with. All XSUBs should now use the functions in the PerlIO abstraction layer and not make any assumptions about what kind of stdio is being used.
For a complete description of the PerlIO abstraction, consult the perlapio manpage.
A lot of opcodes (this is an elementary operation in the internal perl stack machine) put an SV* on the stack. However, as an optimization the corresponding SV is (usually) not recreated each time. The opcodes reuse specially assigned SVs ( targets) which are (as a corollary) not constantly freed/created.
Each of the targets is created only once (but see Scratchpads and recursion below), and when an opcode needs to put an integer, a double, or a string on stack, it just sets the corresponding parts of its target and puts the target on stack.
The macro to put this target on stack is PUSHTARG
, and it is directly used in some opcodes, as well as indirectly in
zillions of others, which use it via (X)PUSH[pni]
.
The question remains on when the SVs which are targets for opcodes are created. The answer is that they are created when the current unit -- a subroutine or a file (for opcodes for statements outside of subroutines) -- is compiled. During this time a special anonymous Perl array is created, which is called a scratchpad for the current unit.
A scratchpad keeps SVs which are lexicals for the current unit and are targets for opcodes. One can deduce that an
SV lives on a scratchpad by looking on its flags: lexicals have
SVs_PADMY
set, and
targets have SVs_PADTMP
set.
The correspondence between OPs and targets is not 1-to-1. Different OPs in the compile tree of the unit can use the same target, if this would not conflict with the expected life of the temporary.
In fact it is not 100% true that a compiled unit contains a pointer to the scratchpad AV. In fact it contains a pointer to an AV of (initially) one element, and this element is the scratchpad AV. Why do we need an extra level of indirection?
The answer is recursion, and maybe (sometime soon) threads. Both these can create several execution pointers going into the same subroutine. For the subroutine-child not write over the temporaries for the subroutine-parent (lifespan of which covers the call to the child), the parent and the child should have different scratchpads. (And the lexicals should be separate anyway!)
So each subroutine is born with an array of scratchpads (of length 1). On each entry to the subroutine it is checked that the current depth of the recursion is not more than the length of this array, and if it is, new scratchpad is created and pushed into the array.
The targets on this scratchpad are undefs, but they are already marked with correct flags.
Here we describe the internal form your code is converted to by Perl. Start with a simple example:
$a = $b + $c;
This is converted to a tree similar to this one:
assign-to / \ + $a / \ $b $c
(but slightly more complicated). This tree reflects the way Perl parsed your code, but has nothing to do with the execution order. There is an additional ``thread'' going through the nodes of the tree which shows the order of execution of the nodes. In our simplified example above it looks like:
$b ---> $c ---> + ---> $a ---> assign-to
But with the actual compile tree for $a = $b + $c
it is different: some nodes optimized away. As a corollary, though the actual tree contains more nodes than our
simplified example, the execution order is the same as in our example.
If you have your perl compiled for debugging (usually done with -D
optimize=-g
on Configure
command line), you may examine the compiled tree by specifying -Dx
on the Perl command line. The output takes several lines per node, and for $b+$c
it looks like this:
5 TYPE = add ===> 6 TARG = 1 FLAGS = (SCALAR,KIDS) { TYPE = null ===> (4) (was rv2sv) FLAGS = (SCALAR,KIDS) { 3 TYPE = gvsv ===> 4 FLAGS = (SCALAR) GV = main::b } } { TYPE = null ===> (5) (was rv2sv) FLAGS = (SCALAR,KIDS) { 4 TYPE = gvsv ===> 5 FLAGS = (SCALAR) GV = main::c } }
This tree has 5 nodes (one per TYPE
specifier), only 3 of them are not optimized away (one per number in the
left column). The immediate children of the given node correspond to {}
pairs on the same level of indentation, thus this listing corresponds to
the tree:
add / \ null null | | gvsv gvsv
The execution order is indicated by ===>
marks, thus it is 3
4 5 6
(node 6
is not included into above listing), i.e.,
gvsv gvsv add whatever
.
The tree is created by the pseudo-compiler while yacc code feeds it the constructions it recognizes. Since yacc works bottom-up, so does the first pass of perl compilation.
What makes this pass interesting for perl developers is that some
optimization may be performed on this pass. This is optimization by
so-called check routines. The correspondence between node names and corresponding check routines is
described in opcode.pl (do not forget to run make regen_headers
if you modify this file).
A check routine is called when the node is fully constructed except for the execution-order thread. Since at this time there are no back-links to the currently constructed node, one can do most any operation to the top-level node, including freeing it and/or creating new nodes above/below it.
The check routine returns the node which should be inserted into the tree (if the top-level node was not modified, check routine returns its argument).
By convention, check routines have names ck_*
. They are usually called from new*OP
subroutines (or convert
) (which in turn are called from perly.y).
Immediately after the check routine is called the returned node is checked for being compile-time executable. If it is (the value is judged to be constant) it is immediately executed, and a constant node with the ``return value'' of the corresponding subtree is substituted instead. The subtree is deleted.
If constant folding was not performed, the execution-order thread is created.
When a context for a part of compile tree is known, it is propagated down through the tree. At this time the context can have 5 values (instead of 2 for runtime context): void, boolean, scalar, list, and lvalue. In contrast with the pass 1 this pass is processed from top to bottom: a node's context determines the context for its children.
Additional context-dependent optimizations are performed at this time. Since at this moment the compile tree contains back-references (via ``thread'' pointers), nodes cannot be
free()d
now. To allow optimized-away nodes at this stage, such nodes are
null()ified
instead of
free()ing
(i.e. their type is changed to
OP_NULL).
After the compile tree for a subroutine (or for an eval or a file) is created, an additional pass over the code is performed. This pass is neither top-down or bottom-up, but in the execution order (with additional complications for conditionals). These optimizations are done in the subroutine
peep().
Optimizations performed at this stage are subject to the same restrictions as in the pass 2.
This is a listing of functions, macros, flags, and variables that may be useful to extension writers or that may be found while reading other extensions.
Note that all Perl
API global variables must be referenced with the PL_
prefix. Some macros are provided for compatibility with the older,
unadorned names, but this support will be removed in a future release.
It is strongly recommended that all Perl
API functions that don't begin with perl
be referenced with an explicit Perl_
prefix.
The sort order of the listing is case insensitive, with any occurrences of '_' ignored for the the purpose of sorting.
void av_clear (AV* ar)
key
is the index to which the array should be extended.
void av_extend (AV* ar, I32 key)
key
is the index. If lval
is set then the fetch will be part of a store. Check that the return value
is non-null before dereferencing it to a perlman:perlguts.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use this function on tied arrays.
SV** av_fetch (AV* ar, I32 key, I32 lval)
I32 av_len (AV* ar)
AV* av_make (I32 size, SV** svp)
SV* av_pop (AV* ar)
void av_push (AV* ar, SV* val)
SV* av_shift (AV* ar)
key
. The return value will be
NULL if the operation failed or if the value did not
need to be actually stored within the array (as in the case of tied
arrays). Otherwise it can be dereferenced to get the original perlman:perlguts. Note that the caller is responsible for suitably incrementing the
reference count of val
before the call, and decrementing it if the function returned
NULL.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use this function on tied arrays.
SV** av_store (AV* ar, I32 key, SV* val)
void av_undef (AV* ar)
void av_unshift (AV* ar, I32 num)
xsubpp
to indicate the class name for a
C++
XS constructor. This is always a
char*
. See perlman:perlguts and
perlman:perlxs.
memcpy
function. The perlman:perlop is the source, d
is the destination, n
is the number of items, and t
is the type. May fail on overlapping copies. See also perlman:perlguts.
void Copy( s, d, n, t )
HV* CvSTASH( SV* sv )
SvPV( GvSV( PL_DBsub ), PL_na )
mark
, for the
XSUB. See perlman:perlguts and
perlman:perlguts.
xsubpp
. Declares the perlman:perlguts variable to indicate the number of items on the stack.
xsubpp
.
iotype
is what IoTYPE(io)
would contain.
do_binmode(fp, iotype, TRUE);
ENTER;
EXTEND( sp, int x )
fbm_instr()
-- the
Boyer-Moore algorithm.
void fbm_compile(SV* sv, U32 flags)
str
and
strend
. It returns perlman:perlguts if the string can't be found. The
sv
does not have to be fbm_compiled, but the search will not be as fast then.
char* fbm_instr(char *str, char *strend, SV *sv, U32 flags)
FREETMPS;
name
and a defined subroutine or
NULL
. The glob lives in the given stash
, or in the stashes accessible via @ISA
and
@UNIVERSAL.
The argument level
should be either 0 or -1. If level==0
, as a side-effect creates a glob with the given name
in the given
stash
which in the case of success contains an alias for the subroutine, and sets
up caching info for this glob. Similarly for all the searched stashes.
This function grants "SUPER"
token as a postfix of the stash name.
The
GV returned from perlman:perlguts may be a method cache entry, which is not visible to Perl code. So when
calling perlman:perlguts, you should not use the
GV directly; instead, you should use the method's
CV, which can be obtained from the
GV with the
GvCV
macro.
GV* gv_fetchmeth (HV* stash, char* name, STRLEN len, I32 level)
stash
. In fact in the presense of autoloading this may be the glob for
``AUTOLOAD''. In this case the corresponding variable
$AUTOLOAD
is already setup.
The third parameter of perlman:perlguts determines whether
AUTOLOAD lookup is performed if the given method is not present: non-zero means yes, look for
AUTOLOAD; zero means no, don't look for
AUTOLOAD. Calling
perlman:perlguts is equivalent to calling perlman:perlguts with a non-zero autoload
parameter.
These functions grant "SUPER"
token as a prefix of the method name.
Note that if you want to keep the returned glob for a long time, you need to check for it being
``AUTOLOAD'', since at the later time the call may load a different subroutine due to $AUTOLOAD
changing its value. Use the glob created via a side effect to do this.
These functions have the same side-effects and as perlman:perlguts with
level==0
. name
should be writable if contains ':'
or '\''
. The warning against passing the
GV returned by perlman:perlguts to
perlman:perlguts apply equally to these functions.
GV* gv_fetchmethod (HV* stash, char* name) GV* gv_fetchmethod_autoload (HV* stash, char* name, I32 autoload)
create
is set then the package will be created if it does not already exist. If create
is not set and the package does not exist then
NULL is returned.
HV* gv_stashpv (char* name, I32 create)
HV* gv_stashsv (SV* sv, I32 create)
char*
pointer is to be expected. (For information only--not to be used).
U32 HeHASH(HE* he)
char*
or perlman:perlguts, depending on the value of
perlman:perlguts. Can be assigned to. The perlman:perlguts or perlman:perlguts macros are usually preferable for finding the value of a key.
char* HeKEY(HE* he)
int HeKLEN(HE* he)
char*
value, doing any necessary dereferencing of possibly perlman:perlguts keys. The length of the string is placed in len
(this is a macro, so do not use
&len
). If you do not care about what the length of the key is, you may use the
global variable perlman:perlguts. Remember though, that hash keys in perl are free to contain embedded
nulls, so using strlen()
or similar is not a good way to find the length of hash keys. This is very
similar to the perlman:perlguts macro described elsewhere in this document.
char* HePV(HE* he, STRLEN len)
HeSVKEY(HE* he)
char*
key.
HeSVKEY_force(HE* he)
HeSVKEY_set(HE* he, SV* sv)
HeVAL(HE* he)
void hv_clear (HV* tb)
void hv_delayfree_ent (HV* hv, HE* entry)
klen
is the length of the key. The
flags
value will normally be zero; if set to
G_DISCARD then
NULL will be returned.
SV* hv_delete (HV* tb, char* key, U32 klen, I32 flags)
flags
value will normally be zero; if set to
G_DISCARD then
NULL will be returned.
hash
can be a valid precomputed hash value, or 0 to ask for it to be computed.
SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash)
klen
is the length of the key.
bool hv_exists (HV* tb, char* key, U32 klen)
hash
can be a valid precomputed hash value, or 0 to ask for it to be computed.
bool hv_exists_ent (HV* tb, SV* key, U32 hash)
klen
is the length of the key. If lval
is set then the fetch will be part of a store. Check that the return value
is non-null before dereferencing it to a perlman:perlguts.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use this function on tied hashes.
SV** hv_fetch (HV* tb, char* key, U32 klen, I32 lval)
hash
must be a valid precomputed hash number for the given key
, or 0 if you want the function to compute it.
IF lval
is set then the fetch will be part of a store. Make sure the return value
is non-null before accessing it. The return value when tb
is a tied hash is a pointer to a static location, so be sure to make a copy
of the structure if you need to store it somewhere.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use this function on tied hashes.
HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash)
void hv_free_ent (HV* hv, HE* entry)
I32 hv_iterinit (HV* tb)
Returns the number of keys in the hash (i.e. the same as HvKEYS(tb)
). The return value is currently only meaningful for hashes without tie
magic.
NOTE: Before version 5.004_65, perlman:perlguts used to return the number of hash buckets that happen to be in use. If you
still need that esoteric value, you can get it through the macro HvFILL(tb)
.
char* hv_iterkey (HE* entry, I32* retlen)
SV* hv_iterkeysv (HE* entry)
HE* hv_iternext (HV* tb)
SV* hv_iternextsv (HV* hv, char** key, I32* retlen)
SV* hv_iterval (HV* tb, HE* entry)
void hv_magic (HV* hv, GV* gv, int how)
char* HvNAME (HV* stash)
key
and klen
is the length of the key. The hash
parameter is the precomputed hash value; if it is zero then Perl will compute it. The return value will be
NULL if the operation failed or if the value did not need to be actually stored within the hash (as in the case of tied hashes). Otherwise it can be dereferenced to get the original
perlman:perlguts. Note that the caller is responsible for suitably incrementing the
reference count of val
before the call, and decrementing it if the function returned
NULL.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use this function on tied hashes.
SV** hv_store (HV* tb, char* key, U32 klen, SV* val, U32 hash)
val
in a hash. The hash key is specified as key
. The hash
parameter is the precomputed hash value; if it is zero then Perl will compute it. The return value is the new hash entry so created. It will be
NULL if the operation failed or if the value did not need to be actually stored within the hash (as in the case of tied hashes). Otherwise the contents of the return value can be accessed using the
He???
macros described here. Note that the caller is responsible for suitably
incrementing the reference count of val
before the call, and decrementing it if the function returned
NULL.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use this function on tied hashes.
HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash)
void hv_undef (HV* tb)
char
is an ascii alphanumeric character or digit.
int isALNUM (char c)
char
is an ascii alphabetic character.
int isALPHA (char c)
char
is an ascii digit.
int isDIGIT (char c)
char
is a lowercase character.
int isLOWER (char c)
char
is whitespace.
int isSPACE (char c)
char
is an uppercase character.
int isUPPER (char c)
xsubpp
to indicate the number of items on the stack. See perlman:perlxs.
xsubpp
to indicate which of an XSUB's aliases was used to invoke it. See perlman:perlxs.
LEAVE;
int looks_like_number(SV*)
int mg_clear (SV* sv)
int mg_copy (SV *, SV *, char *, STRLEN)
MAGIC* mg_find (SV* sv, int type)
int mg_free (SV* sv)
int mg_get (SV* sv)
U32 mg_len (SV* sv)
void mg_magical (SV* sv)
int mg_set (SV* sv)
memmove
function. The perlman:perlop is the source, d
is the destination, n
is the number of items, and t
is the type. Can do overlapping moves. See also perlman:perlguts.
void Move( s, d, n, t )
malloc
function.
void* New( x, void *ptr, int size, type )
AV* newAV (void)
malloc
function, with cast.
void* Newc( x, void *ptr, int size, type, cast )
sub FOO () { 123 }
which is eligible for inlining at compile-time.
void newCONSTSUB(HV* stash, char* name, SV* sv)
HV* newHV (void)
SV* newRV_inc (SV* ref)
For historical reasons, ``newRV'' is a synonym for ``newRV_inc''.
SV* newRV_noinc (SV* ref)
len
parameter indicates the number of bytes of preallocated string space the
SV should have. An extra byte for a tailing
NUL is also reserved. (SvPOK is not set for the
SV even if string space is allocated.) The reference count for the new
SV is set to 1.
id
is an integer id between 0 and 1299 (used to identify leaks).
SV* NEWSV (int id, STRLEN len)
SV* newSViv (IV i)
SV* newSVnv (NV i)
len
is zero then Perl will compute the length.
SV* newSVpv (char* s, STRLEN len)
SV* newSVpvf(const char* pat, ...);
len
is zero then Perl will create a zero length string.
SV* newSVpvn (char* s, STRLEN len)
rv
, to point to. If rv
is not an
RV then it will be upgraded to one. If classname
is non-null then the new
SV will be blessed in the specified package. The new
SV is returned and its reference count is 1.
SV* newSVrv (SV* rv, char* classname)
SV* newSVsv (SV* old)
xsubpp
to hook up XSUBs as Perl subs.
xsubpp
to hook up XSUBs as Perl subs. Adds Perl prototypes to the subs.
malloc
function. The allocated memory is zeroed with memzero
.
void* Newz( x, void *ptr, int size, type )
I32 perl_call_argv (char* subname, I32 flags, char** argv)
I32 perl_call_method (char* methname, I32 flags)
I32 perl_call_pv (char* subname, I32 flags)
I32 perl_call_sv (SV* sv, I32 flags)
I32 perl_eval_sv (SV* sv, I32 flags)
SV* perl_eval_pv (char* p, I32 croak_on_error)
create
is set and the Perl variable does not exist then it will be created. If create
is not set and the variable does not exist then
NULL is returned.
AV* perl_get_av (char* name, I32 create)
create
is set and the Perl variable does not exist then it will be created. If create
is not set and the variable does not exist then
NULL is returned.
CV* perl_get_cv (char* name, I32 create)
create
is set and the Perl variable does not exist then it will be created. If create
is not set and the variable does not exist then
NULL is returned.
HV* perl_get_hv (char* name, I32 create)
create
is set and the Perl variable does not exist then it will be created. If create
is not set and the variable does not exist then
NULL is returned.
SV* perl_get_sv (char* name, I32 create)
void perl_require_pv (char* pv)
int POPi()
long POPl()
char* POPp()
double POPn()
SV* POPs()
PUSHMARK(p)
void PUSHi(int d)
void PUSHn(double d)
len
indicates the length of the string. Handles 'set' magic. See
perlman:perlguts.
void PUSHp(char *c, int len )
void PUSHs(sv)
void PUSHu(unsigned int d)
xsubpp
. See perlman:perlguts and the perlcall manpage for other uses.
PUTBACK;
realloc
function.
void* Renew( void *ptr, int size, type )
realloc
function, with cast.
void* Renewc( void *ptr, int size, type, cast )
xsubpp
to hold the return value for an
XSUB. This is always the proper type for the
XSUB. See
perlman:perlxs.
free
function.
malloc
function.
realloc
function.
char* savepv (char* sv)
len
indicates number of bytes to copy. This does not use an
SV.
char* savepvn (char* sv, I32 len)
SAVETMPS;
xsubpp
. See perlman:perlguts and
perlman:perlguts.
SPAGAIN;
SV* ST(int x)
int strEQ( char *s1, char *s2 )
s1
, is greater than or equal to the second, s2
. Returns true or false.
int strGE( char *s1, char *s2 )
s1
, is greater than the second,
s2
. Returns true or false.
int strGT( char *s1, char *s2 )
s1
, is less than or equal to the second, s2
. Returns true or false.
int strLE( char *s1, char *s2 )
s1
, is less than the second,
s2
. Returns true or false.
int strLT( char *s1, char *s2 )
int strNE( char *s1, char *s2 )
len
parameter indicates the number of bytes to compare. Returns true or false.
int strnEQ( char *s1, char *s2 )
len
parameter indicates the number of bytes to compare. Returns true or false.
int strnNE( char *s1, char *s2, int len )
SV* sv_2mortal (SV* sv)
SV* sv_bless (SV* sv, HV* stash)
void sv_catpv (SV* sv, char* ptr)
void sv_catpvn (SV* sv, char* ptr)
len
indicates number of bytes to copy. Handles 'get' magic, but not 'set'
magic. See perlman:perlguts.
void sv_catpvn (SV* sv, char* ptr, STRLEN len)
void sv_catpvn_mg (SV* sv, char* ptr, STRLEN len)
void sv_catpvf (SV* sv, const char* pat, ...)
void sv_catpvf_mg (SV* sv, const char* pat, ...)
ssv
onto the end of the string in
SV
dsv
. Handles 'get' magic, but not 'set' magic. See perlman:perlguts.
void sv_catsv (SV* dsv, SV* ssv)
void sv_catsv_mg (SV* dsv, SV* ssv)