bsb has asked for the wisdom of the Perl Monks concerning the following question:

Well, it's more of a recursive "for".

I've changed treemap around and called it do_rec.

It now recurses data structures composed of hashes, arrays, scalars and their references, executing a block/coderef with $_ aliased appropriately so that changing $_ modifies the element.

By default, the block is only executed for leaves.
If you pass in a prototyped subroutine the prototype is (mis)used to decide when to run your code
eg. do_rec sub(%;) {...} $data; # is run for values & HASHes
Blessing is ignored, objects are scanned (unless you "cut")

I have some biggish questions about my implementation:

Is this (ab)use of prototypes as out-of-band info evil?
Would attributes be better?

Does $seen->{ ref($_) ? $_ : \$_} = undef; really mark a value as seen? (demerphq?)

I use special "cut" return value to signal that this thing should not be recursed. I'm trying to make it impossible to accidently return it..?

Is there an easier way to keep the $_ aliasing?

I'd also like local-ized variables, for example to build up a "1.3.4" type path, but it needs more thought...

The code as I go home:

#!/usr/bin/perl -w =pod do_rec BLOCK LIST do_rec CODEREF LIST =head1 SYNOPSIS # upper case your leaves do_rec { $_ = uc $_; } $data, $structures; # upper case your leaves (except children of arrays) do_rec sub (*) { $_ = uc $_; cut if ref($_) eq 'ARRAY'; } $hashref; =head1 DESCRIPTION ... =head1 NOTES Originally stolen from treemap: http://www.perlmonks.org/index.pl?node_id=60829&displaytype=displayco +de Two paths to the same thing will be taken randomly if there's an intervening hash If there's two paths to a thing, both must be cut. =cut use strict; use Carp (); our $cut = \'cut magic cookie'; sub cut() { $cut } sub VALUE() { 1 } sub HASH() { 2 } sub ARRAY() { 4 } sub SCALAR() { 8 } sub REF() { 16 } sub OBJECT() { 32 } #$ @ % * & ; \@ \% \$ my %proto_map = ( ';' => VALUE, '$' => SCALAR, '\\' => REF, '%' => HASH, '@' => ARRAY, '*' => HASH|ARRAY|SCALAR|REF|VALUE, '&' => OBJECT, # The proto isn't used anyway, these compile for me... 'V' => VALUE, 'S' => SCALAR, 'R' => REF, 'H' => HASH, 'A' => ARRAY, 'O' => OBJECT, ); sub do_rec (&@) { my $code = shift; my $want; if(my $proto = prototype $code) { $want = 0; $proto =~ s{([;$%@*&\\VSRHAO])} { $want |= $proto_map{ uc $1 }; '' }gei; Carp::croak "Unknown prototype: ",prototype $code if $proto; } else { $want = VALUE; } _do_rec($code, $want, my $seen = {}, @_); } sub _do_rec { my ($code, $want, $seen) = (shift, shift, shift); for (@_) { # just one after the first; next if (exists $seen->{$_}); my ($key, $type); if(ref($_)) { $key = $_; # gets stringified # ref can only be blessed into one thing at a time (so it' +s ok) if (UNIVERSAL::isa($_,'HASH')) { # isa to ignore blessing $type = HASH; } elsif (UNIVERSAL::isa($_,'ARRAY')) { $type = ARRAY; } elsif (UNIVERSAL::isa($_,'SCALAR')) { $type = SCALAR; } elsif (UNIVERSAL::isa($_,'REF')) { $type = REF; } # if they want an OBJECT, OR it in if($want & OBJECT && m/=/) { $type |= OBJECT; } } else { $key = \$_; # reference to where value is stored $type = VALUE; } $seen->{$key} = undef; # Call the $code if($want & $type) { my $ret = $code->(); # pass type/ref? next if(ref($ret) && $ret == $cut); # return cut cookie -> + next } # Recurse appropriately if ($type & HASH) { _do_rec($code, $want, $seen, $_) for values %$_; # keys? do in higher level } elsif ($type & ARRAY) { # could hoist nonrefs like reply on pm: (need to mark seen +) _do_rec($code, $want, $seen, $_) for @$_; } elsif ($type & (SCALAR|REF) ) { _do_rec($code, $want, $seen, $_) for $$_; } } } ####################### EXAMPLE ############################# our $data = { 'arrays' => [[ 'shared', 'not_shared' ]], 'num' => 2, #'blahs' => \'blah blah', # mod of ro var 'blahs' => \do { my $a = 'blah blah' }, 'hash' => { 'a' => 'vala', 'b' => 'valb', 'c' => bless({ qn=> 'no_qnum' },'Obj'), }, 'blessed' => bless(\{ qn=> 'that' },'Obj::Ref'), }; # shared value $data->{arrays} = bless(\$data->{arrays}[0][0],"Not::Scalar"); # ref to the same hash via 2 scalar refs with different blessing $data->{another_obj} = bless(\do{ my $o = ${$data->{blessed}}},"Not::O +bj"); use Data::Dumper; $Data::Dumper::Purity=1; # objects: do_rec sub (&) { print ref($_)," " } => $data; print "\n\n"; # all the leaves do_rec { $_ = "#$_#"; } $data; # (*) = everything my $count = 1; do_rec sub (*) { return cut if ref($_) eq 'ARRAY'; return $_ = "=\U$_=" if !ref($_); # leaves $_->{qnum} = $count++ if ref($_) eq 'HASH' && exists $_->{qn}; } => $data; print Dumper $data; print "-----------------------------------------\n\n"; # Another example $data = [ \do{ my $s = "string"} ]; $data->[1] = \$data->[0]; $data->[2] = \do{ my $s = "last" }; do_rec { $_ = uc $_; } $data; print Dumper $data; print "-----------------------------------------\n\n"; do_rec sub (Obj) { print ref($_),"\n" } => $data; __END__
Brad

Replies are listed 'Best First'.
Re: Recursive map Design Questions
by Abigail-II (Bishop) on Oct 01, 2003 at 14:53 UTC
    $seen->{ ref($_) ? $_ : \$_} = undef

    You are aware that ref can return a false value even if its argument is a reference?

    $ perl -wle 'print ref bless [] => 0' 0

    You ought to check for definedness, which is easy in 5.8.1 with the defined-or patch:

    $seen -> {ref () // \$_} = undef;

    Abigail

      Abigail. Are you seriously suggesting that people should cater to the possibility that references may have been blessed into a package of 0?

      Firstly, it would have to be a deliberate, willful act to do it as it isn't possible (that I am aware of) to actually have a package 0;

      Secondly, shouldn't bless [], 0; be considered a bug and reported for fixing?

      Is there any possible legitimate use of this?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        I think you have a point on one level and on the other have temporarily forgotten a core principle of Perl.

        Do these tricks have a legitimate purpose?
        Not really, or not to my knowledge anyway. Maybe obfus or something.
        Are they probably a bad idea to do?
        Yep.
        Should we worry about it in average code?
        Nope.
        Should we worry about it in code specifically meant to properly grok various data types and structures?
        Yep.
        Should we make thse things illegal?
        Nope.
        Why not?
        Just because we can't see a good reason to do these things doesn't mean that there will never be a good reason to do them. So its better to leave them alone.
        Yeah but isn't that a bit dangerous
        Yeah but so is ten feet of rope.
        It is?
        Sure, somebody could hang themself.
        Er...
        Ever notice that you can redefine the value of numeric constants in perl?
        What?
        Sure. You can define 1 to be an approximation of pi if you like.
        Oh god.
        Yep. Be careful with that there rope now y'hear.

        ;-)


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi


        Well, people are expected to consider the possibility files are named "0", or that files can end with "\n0" (for the latter, there are even warnings build in in Perl).

        I'd say that if you code general solutions, and that's what the OP was doing, you should consider such possibilities. Programs that run in a specific, or controlled environment may not have to take such things into consideration.

        Oh, and don't get the idea that a package of '0' has never happened. The original post triggered a memory. Years ago, a co-worker was working on some complicated template system. Some code also dealt with 'ref' and blessed objects. And code *did* fail because some other code (which generated code) blessed objects in the package '0'.

        OTOH, I've never encountered a supposedly text file that ended with "\n0". That doesn't mean I won't check for definedness when necessary.

        Secondly, shouldn't bless [], 0; be considered a bug and reported for fixing?

        Uhm, why? Just because it's inconvenient? Then we might as well forbid filenames to contain spaces and Microsoft Outlook - that's inconvenient as well. ;-)

        Abigail

      Now I'm confused.
      $ perl -wle 'print defined ref 0' 1 $ perl -wle 'my $a; print defined ref $a' 1 $ perldoc -f ref ref EXPR ref Returns a true value if EXPR is a reference, false otherwise. If EXPR is not specified, $_ will be used. The value returned depends on the type of thing the reference is a reference to. # Salvation? $ perl -MScalar::Util=blessed -wle 'my $a; print defined blessed(bles +s [] => 0)' 1 $ perl -MScalar::Util=blessed -wle 'my $a; print defined blessed($a)'
        Yes, my bad. If the argument isn't reference, 'ref' returns the empty string, not undef. But the first line of the documentation you quote is provebly wrong. I'll write a patch.

        Abigail

Re: Recursive map Design Questions
by simonm (Vicar) on Oct 01, 2003 at 18:10 UTC
    Is this (ab)use of prototypes as out-of-band info evil? Would attributes be better?

    Weird, but not pure evil.

    I use special "cut" return value to signal that this thing should not be recursed. I'm trying to make it impossible to accidently return it..?

    I think the solution you have is pretty decent; unless someone traverses over your package's symbol table or otherwise goes looking for trouble, it should be pretty darn unlikely that they produce a string that matches your unique reference's ID.

Re: Recursive map Design Questions
by demerphq (Chancellor) on Oct 03, 2003 at 16:12 UTC

    Interesting stuff. I think you have some potential zaps and traps with how you are identifying type and handling seen items. It looks to me that something like this would cause trouble:

    my $x=\"foo"; my @array=("$x",$x);

    As would any object with stringification overloaded. Can you see the havoc this would cause:

    package Bitch; use overload qw("" bitch); my $bitch="Bitch0000"; sub bitch { $bitch++ }; package main; my ($x,$y); $y=bless \$x,'Bitch'; $x=bless \$y,'Bitch';

    Also for this type of scenario don't use UNIVERSAL. I know this goes against common advice, but ref and UNIVERSAL are not meant for type checks except in the loosest sense. Just as Abigail-II correctly pointed out that an object of class '0' would cause trouble so too would an object such as would be returned by

    bless [],'HASH'

    With regard to BrowserUks point on this matter I generally agree that code need not worry about such nasty tricks. If an enduser wants to do such silly things then its their own fault. But with code deliberately designed to traverse a perl data structure correctly IMO should be coded to bypass such foolishness. Especially as it would not be particularly difficult to handle these cases.

    What I would do is utilize Scalar::Util (now core as of 5.8), more specifically reftype() and refaddr(). Using refaddr is much more suitable for tracking seen items, and reftype provides a neat and infalable method of determinging an objects underlying type.

    Im worried about the protyping trick. I think its cool on face value just for being tricky, but I think its a bit limiting. Why couldnt one provide a hash of types and callbacks for each? Also, $cut is a problem in my eyes. I think I would prefer to just have it return true or false rather than deal with this special variable in user code. (And File::Find and friends trick of localized vars isnt the best plan either IMNSHO.)

    A few last minor thoughts are that I don't like the name of the sub (sorry), i prefer 'treemap' to 'do_rec' and I prefer 'apply' to 'treemap', and a little more documentation would have been nice. :-) It took me while to come to grips with what was going on. (IMO you shouldnt have to read the treemap thread first to understand your post... But maybe i'm just hungover. :-) Overall this looks pretty interesting, and leads me to return to thinking about the overall design of Data::Dumper and Data::BFDump. Good stuff. Thanks.

    :-)


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      Thanks for the Scalar::Util pointers

      Im worried about the prototyping trick

      I went down that path so that the leaf case could use the bare block syntax like a map. Also I wanted a variadic function and since the remaining args could be anything... there's not much room. I could have two entry functions, one for leaves and one with a hash mapping of functions.

      I do like the minimal do {this} to @these ... Maybe 3 calling modes, leaf, prototype hack or sub hash. I'll think some more.

      $cut is a problem in my eyes. I think I would prefer to just have it return true or false rather than deal with this special variable

      Again, this was for aesthetics of the simple case. Most of the time you won't be cutting, just tweaking, or at least that the case I want to make easy. (It's also a design hang over from when it was more map-like and used the return value as replacement) Maybe die "cut" could be an appropriate out of band return value.

      I don't like the name either. Still considering traverse, visit, for_tree, apply. I suspect that functional languages have a good name for this, I should research it.

        I went down that path so that the leaf case could use the bare block syntax like a map.

        And I can see your point as well. You already have a wrapper function however. If the underlying implementation used a hash of callbacks then you could provide the neater codeblock style as well as the clunkier but more powerful hash of callbacks style as well. Everybody wins then.

        Maybe die "cut" could be an appropriate out of band return value.

        Yep, I can see that working nicely.

        Cheers,


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi