The documentation below is, itself, incomplete. I don't show how the copy-on-write part actually happens, but give me a day or so.
Oh, and if anyone here is good with malloc() and free() in C and can tell me where in my code I should be free()ing things, I'd be very grateful.
COW - How Copy-On-Write Can Be Implemented
Copy-On-Write (or COW) is a concept that delays the copying of potentially large data (strings, specifically) as long as possible. It can be implemented via ``magic'', and this should be trivial for the Perl programming language.
Copy-On-Write (or COW) is a concept that delays the copying of potentially large data (strings, specifically) as long as possible. Here is an example:
$big = "_" x 10_000;
$big is now 10,000 characters long. Let's say we want to store a substring of that in $small.
$small = substr $big, 500, 2000;
$small is now 2,000 characters long. That's a lot to copy. :( What if we could alias $small to the 2,000 characters starting at position 500 in $big, and only copy the 2,000 characters when we needed to (when $big changes)?
That'd be cool. $small would be magical. Here's a demonstration:
alias($small, \$big, 500, 2000);
sub alias { tie $_[0], 'SubstringAlias', \$_[0], @_[1,2,3] }
package SubstringAlias;
sub TIESCALAR {
my ($class, $orig, $targ, $pos, $len) = @_;
bless [ $orig, $targ, $pos, $len ], $class;
}
# when we get $small's value
# we return the substring of the
# target variable
sub FETCH {
my ($self) = @_;
my ($me, $str_ref, $pos, $len) = @$self;
return substr $$str_ref, $pos, $len;
}
# when we store a value in $small
# we should stop it from being an alias!
sub STORE {
my ($self, $value) = @_;
untie ${ $self->[0] };
${ $self->[0] } = $value;
}
Now $small is a tied variable. When we access its contents, the object that is tied to $small has its FETCH() method called:
print $small; # like (tied $small)->FETCH()
# where (tied $small) returns the object
However, what happens when $big changes?! $small would reflect those changes. We don't want that. :( We need a way to make $small get those 2000 characters at the last possible moment -- when $big changes. This ``change'' is called ``writing'', because we're writing some value to $big. We want a ``lazy'' alias, and we can implement this with another tie():
#!/usr/bin/perl
$big = "Perl is a wonderful language"; alias($small, \$big, 5, 15);
print "<$big>\n"; print "<$small>\n\n";
$big = "foo";
print "\n<$big>\n"; print "<$small>\n";
sub alias { tie $_[0], 'SubstringAlias', \$_[0], @_[1,2,3] }
package SubstringAlias;
sub TIESCALAR {
my ($class, $orig, $targ, $pos, $len) = @_;
my $self = bless [ $orig, $targ, $pos, $len ], $class;
# make $big a tied variable TOO!
tie $$targ, 'AliasTarget', $targ, $self;
return $self;
}
sub FETCH {
print "* SubstringAlias::FETCH()\n";
my ($self) = @_;
my ($me, $str_ref, $pos, $len) = @$self;
return substr $$str_ref, $pos, $len;
}
# stop being an alias if we
# are assigned to
# NOTE: unties both
sub STORE {
print "* SubstringAlias::STORE()\n";
my ($self, $value) = @_;
my ($me, $targ) = @$self;
untie $$me;
untie $$targ;
$$me = $value;
}
package AliasTarget;
sub TIESCALAR {
my ($class, $orig, $copy_to) = @_;
bless [ $orig, $$orig, $copy_to ], $class;
}
# just display our value
sub FETCH {
print "* AliasTarget::FETCH()\n";
my ($self) = @_;
return $self->[1];
}
# now we copy our value
# to the alias before
# we change our value
# NOTE: unties both
sub STORE {
print "* AliasTarget::STORE()\n";
my ($self, $value) = @_;
my ($me, $str, $target) = @$self;
untie $$me;
$target->STORE(substr $str, $target->[2], $target->[3]);
$$me = $value;
}
Whew. That program works, and when run, produces the following output:
* AliasTarget::FETCH() <Perl is a wonderful language> * SubstringAlias::FETCH() * AliasTarget::FETCH() <is a wonderful >
* AliasTarget::STORE() * SubstringAlias::STORE()
<foo> <is a wonderful >
The diagnostic print() statements show us what was going on. We see that when we ask for $big's value, we call its FETCH() method, and when we ask for $small's value, we call ITS FETCH() method, which in turn has to call $big's FETCH() method.
When we store the value ``foo'' in $big, we copy the substring of $big to $small before we untie the two of them and leave them as normal strings. That was a bit of work. Wouldn't it be nice if there was a language that did that all transparently?
Let's make one up!
Let's call it COW.
COW is a very simple language. It supports the following operations:
# comments like Perl's
# assigning a string to a variable var = "string"; var = 'string';
# printing a variable or a string print var; print "string"; print 'string';
# assigning another variable to a variable var = other_var;
# "cow"ing another variable to a variable var = cow other_var;
Those are all pretty self-explanatory, except for the last one. The cow function means ``alias var to other_var until other_var is written to, at which point, copy the contents of other_var to var.`` That's a lot for that function to mean, but that's what it does. This is good, because if we never change other_var, we won't have to worry about the duplicated memory (if we copied other_var's contents to var, we'd have a copy of the string).
How is COW implemented? It's not all too hard. It takes a bit of ``magic'', though. Let's step through a program:
name = "japhy";
This creates a scalar value (SV) in the program's symbol table. All SVs have an internal structure like so:
struct _sv {
list *dependecies; /* who is aliasing me? */
MG *magic; /* what type of magic do I have? */
int line; /* what line number was I defined on? */
int flags; /* what flags do I have on? */
int str_len; /* what is the length of my string value? */
STR str; /* the char* representing the string */
};
Whew. That's C code, if you're not familiar with it. Anyway, safe to say the first line of our program doesn't do much. Here's the internal view of the name variable:
name --> dependencies = NULL
magic = NULL
line = 1
flags = 0
str_len = 5
str = "japhy"
Ok. Now let's look at the next line of the program:
alias = cow name;
This creates another variable, alias, and tells it to point to name. This line does a lot of stuff. Let's see what it does to name first:
name --> dependencies = SV*(alias)
magic = NULL
line = 1
flags = SVf_COW
str_len = 5
str = "japhy"
The dependencies list contains a pointer to alias's SV. Here's what alias looks like:
alias --> dependencies = NULL
magic = {
type = MGt_COW
ftbl = {
get = mg_cow_get
set = mg_cow_set
len = mg_cow_len
clear = mg_cow_clear
free = mg_cow_free
}
object = SV(name)
}
line = 2
flags = SVf_MAGIC
str_len = 5
str = SvSTR(SV*(name))
In alias's representation, the SVf_MAGIC flag has been turned on. The type of magic is MGt_COW. The ftbl is a struct of pointers to functions to call for the requested action. Finally, the object field holds the SV the magic functions have access to -- in this case, the place we read from (the SV of name). str is not a copied string, but merely a pointer to name's str field.
Onto the next line:
print alias;
The print function works something like this:
void op_print_str (STR *s) {
printf("%s", s);
}
void op_print_sv (SV *sv) {
/* this will update sv->str if needed */
if (SvMAGIC(sv)) {
MG *mg;
if ((mg = SvMG(sv)) && mg->ftable->get)
CALL(mg->ftable->get)(sv, mg);
}
op_print_str(SvSTR(sv));
}
What's happening here is SvMAGIC(sv) checks to see if sv has the SVf_MAGIC flag on. If so, we extract the MG structure via SvMG(sv) and store it in mg. If that is true (not NULL) and there is a get function defined for it, that function is called -- the CALL() macro just turns a function pointer into a function:
#define CALL(fptr) (*fptr)
The get function will probably update the str field of the SV structure. After that's done, we pass the str field, gotten via SvSTR(sv), to the op_print_str() function.
I borrowed a lot of the design of the scalars and magic from Perl. But that's good, since I hope to translate the implementation directly to a Perl magic structure.
Jeff japhy Pinyan, [mailto://japhy@perlmonk.org].
Download the source at: http://japhy.perlmonk.org/COW/.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
In reply to Don't have a COW? Get one! by japhy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |