This all started with thread sorting a vec. I thought the thread was interesting, and also was BrowserUK's reply, so I went to CPAN and did some search. A module called Tie::VecArray came to my attention.

I played with that module a littel bit, and quickly decided that it was a bad example of what tie was about, and how it should be implemented. I think this is a good lesson that worth to share with monks.

I am not here to define tie, as it is defined already, and for anyone interested, you can check out perltie documentation. But I am interested in giving my view of tie from an OO perspective. My understanding is that tie is a type of interface, which defines a common set of methods that allow you to access an object, regardless of its internal structure.

For example, Tie::Array allows you to access the object tied to an array, as if it is an array without knowing its internals. You can get array element thru [], do things like shift, unshift, push, pop etc. (if they are implemented, but they are not always mandatory)

Now back to the main topic, why is that module bad? Let's look at this example (I have shamlessly stolen the print unpack syntax from BrowserUK:

use strict; use warnings; require Tie::VecArray; my $vec = ''; my @array; vec($vec, $_, 4 ) = rand(16) for 0 .. 99; print join(" ", map{ vec( $vec, $_, 4) } 0 .. 99); print "\n"; my $obj = tie @array, 'Tie::VecArray', 4, $vec; @array = sort {$a <=> $b} @array; print join(" ", map{ vec( $vec, $_, 4) } 0 .. 99);

To me (and I believe to most people), the expected result is that, sort the array will actually sort the vec. However it turened out to be wrong, that sort does not affect the vec at all. So I started to look into the implementation of Tie::VecArray, and I discovered that inside the package, it made a copy of the vec at the construction time, and all the operations later against the array only affect the temperary internal copy of the vec, not the vec itself any more.

This might make sense, if the Tie is read only. However all the examples in the documentation of Tie::VecArray, and also the fact it has STORE, STOESIZE etc. implemented, clearly indicated that it was done as a writable tie.

Who care to write to the temperary internal copy of the vec?

Interesting enough, that module actually passed CPAN testing on most platforms, other than Solaris. I am really curious as what test cases they used, and who decided that those cases were sufficient. Other than the technical lesson, it does ring a bell to me, that one has to be very careful about the quality of CPAN modules, especially those ones that are not well-known, and you have to be prepared to make your own judgement.

The immediate problem could be easily fixed by keeping a ref to the vec instead of copying the content of the vec (The code is only targeted to fix this particualr problem):

package Tie::PGVecArray; use strict; use warnings; use Tie::Array; use POSIX qw(ceil); use base qw(Tie::Array); use fields qw(bits vec size); sub _IDX2BYTES { my $self = shift; my $idx = shift; #define _IDX2BYTES($self, $idx) \ ceil($idx * ($self->{bits}/8)) } sub _BYTES2IDX { my $self = shift; my $bytes = shift; #define _BYES2IDX($self, $bytes) \ ceil($bytes * 8 / $self->{bits}) } sub TIEARRAY { my($class, $bits, $vec) = @_; no strict 'refs'; my $self = bless [\%{$class.'::FIELDS'}], $class; $self->{bits} = $bits; $self->{vec} = $vec; $self->{size} = _BYTES2IDX($self, length $$vec); return $self; } sub FETCH { my $self = shift; return vec(${$self->{vec}}, $_[0], $self->{bits}); } sub STORE { my $self = shift; $self->{size} = $_[0] + 1 if $self->{size} < $_[0] + 1; return vec(${$self->{vec}}, $_[0], $self->{bits}) = $_[1]; } sub FETCHSIZE { my $self = shift; return $self->{size}; } sub STORESIZE { my $self = shift; my $new_size = shift; if( $self->{size} > $new_size ) { my $new_length = _IDX2BYTES($self, $new_size); substr(${$self->{vec}}, $new_length) = '' if $new_length < length ${$self->{vec}}; } $self->{size} = $new_size; } 1;

Now if you try this modified package with the following code (almost the same as the first piece of code, and the only difference is that this time I use Tie::PGVecArray, instead of Tie::VecArray), you will see that the vec is sorted, when you sort the array it tied to:

use strict; use warnings; require Tie::PGVecArray; my $vec = ''; my @array; vec($vec, $_, 4 ) = rand(16) for 0 .. 99; print join(" ", map{ vec( $vec, $_, 4) } 0 .. 99); print "\n"; my $obj = tie @array, 'Tie::PGVecArray', 4, \$vec; @array = sort {$a <=> $b} @array; print join(" ", map{ vec( $vec, $_, 4) } 0 .. 99);

Replies are listed 'Best First'.
Re: A good lesson on Tie ( My error!)
by BrowserUk (Patriarch) on Nov 02, 2003 at 05:43 UTC

    Good post++

    One words of caution though. If you take another look at my post at Re: sorting a vec, you'll find that I have replaced the code

    print unpack '(B4)*', $vec;

    With

    print map{ vec( $vec, $_, 4) } 0 .. 99;

    The reason is that the first example doesn't do what I originally thought, and not what I still think would be the most intuative reading of the code.

    Using the unpack format '(B4)*', doesn't return every set of 4 bits from the string. It returns every other 4-bits from the string! That is, the output consists of a list of numbers, the value of each being determined from most significant 4-bits of each byte of the input string, with the other 4-bits being summarily discarded. (*)

    Only noticed this a while after posting (I noticed that the string I had carefully packed 100 4-bit fields into was only outputting 50 values when printed with the original code).

    This was changed within about 1/2 an hour of the original post, at a time when then CB was quiet, and there were only 8 people listed in others users. You must have grabbed the code pretty quickly after I posted the original.

    I apologise for my error. I don't believe that this affects the information in your post, but if it does, feel free to transfer all the blame to me.

    *It also worth noting that '(b4)*', also grabs the same, most significant, 4-bits of each byte, with the others being discarded -- but it starts processing the bytes in the string from the other end. Very confusing and non-intuative.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

      Thanks!

      So I stole again ;-) Okay I replaced that syntax everywhere in my post.

      But this time I modified your code a little bit, and used this:

      print join(" ", map{ vec( $vec, $_, 4) } 0 .. 99);

      Without those spaces in between those numbers, I was quite confused (visually), as now the numbers no longer have uniformed length, some are 1 digit, when others are two-digits. If I saw 15 on screen, I really don't know whether it is 1 and 5, or 15.

      As you guessed, that error does not affect the content of my meditation, but it is always a good practice to make things better and more precise.

Re: A good lesson on Tie (through a negative example)
by demerphq (Chancellor) on Nov 02, 2003 at 16:09 UTC

    Well I dont really think Schwern was thinking about people who might need the stringified version of the array, but rather for situations where having a real array would be a waste of space. I dont think you can argue that his implementation is buggy, it just doesn't do something that you think it should. (And frankly that I think it should too :-)

    Anyway, I think your patch would be undesirable as it would cause all code using Tie::VecArray to break as they would be passing in non refs. But changing the TIEARRAY method resolves that problem.

    sub TIEARRAY { my $class=shift; my $bits =shift; no strict 'refs'; my $self = bless [\%{$class.'::FIELDS'}], $class; $self->{bits} = $bits; if (@_ and eval { $_[0]=$_[0]; 1} ) { $self->{vec} = \$_[0]; } else { my $vec=@_ ? $_[0] : ""; $self->{vec} = \$vec; } $self->{size} = _BYTES2IDX($self, length ${$self->{vec}}); return $self; }

    However on second thought that may be a problem as it could have disasterous results on code not expecting alias semantics on $vec, but rather the original copy semantics. So a variant to your approach would do nicely:

    sub TIEARRAY { my($class, $bits, $vec) = @_; no strict 'refs'; my $self = bless [\%{$class.'::FIELDS'}], $class; $self->{bits} = $bits; $self->{vec} = ref $vec ? $vec : \$vec; $self->{size} = _BYTES2IDX($self, length ${$self->{vec}}); return $self; }

    Now you would have both semantics. If you pass a ref its expected to a be a reference to a writeable scalar, if you dont then the value is copied. This would be a patch that shouldnt break anything that already exists, and adds useful functionality.

    Which brings me to a different point. Instead of just writing a medition about how your version is better, why not work out a patch that shouldnt break existing code, but provides your desired behaviour and then give it to the author for the next release? Hint, hint. :-)


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


Re: A good lesson on Tie (through a negative example)
by sauoq (Abbot) on Nov 02, 2003 at 11:04 UTC
    Interesting enough, that module actually passed CPAN testing on most platforms, other than Solaris. I am really curious as what test cases they used, and who decided that those cases were sufficient.

    I believe that just means it built okay and passed its own tests. (I.e. make tests worked.) The test cases are decided upon and implemented by the author(s) of the module.

    The model generally works well. Except on a case by case basis, it wouldn't make sense for third parties to try to write tests for your module. You specified it and you wrote it so you create the tests. CPAN makes it available. Users test it. You answer the bug reports.

    -sauoq
    "My two cents aren't worth a dime.";
    

      I think that I have to agree with you ;-)

      On the other hand, those test cases designed by the author can only be considered as unit test cases. It must then pass UAT (User Acceptance Testing).

      Now talk about UAT, the author of the module shall never influence, not to say design, the UAT test cases. But I understand that as an open forum, each tester (I remember they had/have this role for CPAN) only has very limited amount of time to commit, as everybody has their day time job to do. Think about the numbers of CPAN modules people submit all the time, the idea of having formal UAT is also defeated.

      Even if there would be a UAT, the quality of the UAT is again a question, as those people who do the UAT for a module may never use it in their real life, and they don't really care (the level they care is much lower than your real users, who do the UAT and will have to live with the bugs they missed ;-).

      Now you are end up with doing the UAT yourself. But for well known modules, the situation is quite different, as lots of people used them before you, which means that lots of people have done UAT for you already.

        But for well known modules, the situation is quite different, as lots of people used them before you, which means that lots of people have done UAT for you already.

        Well, sure! :-)

        And not only do popular modules get more testing, the popularity of a module itself is proof that it has been accepted by many users.

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: A good lesson on Tie (through a negative example)
by fokat (Deacon) on Nov 02, 2003 at 05:09 UTC

    Excellent meditation, pg.

    I think the situation you describe is, unfortunately, more common than it should be. There's no substitute for your own testing of the components you intend to use in your project. Also note that in the example you used, coverage testing would give no added benefits as the problem seems to be garbled logic.

    Best regards

    -lem, but some call me fokat