This meditation is about a tied hash package, and how it came into existence. I am still meditating whether this is too obscure, or whether its goal is better achieved using some other technique; is it worth being uploaded to CPAN as yet another strange perl delirium? is the name ok? Any suggestions, review, critics are welcome. Thanks for your time.

Itch

Over 100 poorly performing scripts written in some BASIC dialect for exactly the same purpose (read source records, transform them, write target records), sporting hardcoded parameters and different output assembling code, proliferating with each new customer (copy over, twiddle, tweak).

Scratch

Perl to the rescue to do the data gathering and munging, and write a unified import CSV to be fed into that dratted basic script - one for all. Parameters and data transforming procedures should be kept separate, in a format editable by non-perlers. I choose INI file style, which fitted both that BASIC dialect and perl:

SCALAR=Some value LIST=Foo,Bar,Baz HASH=Foo:Bar,Baz:Quux

Straight forward. What about the data transforming rules? Since these are concatenations of values from the input record - conveniently present as a hash - and the output of some functions munging those values, these are something that could easily be transformed into subroutines:

DOMAIN=@example.com USER=sAMAccountName.DOMAIN PASS=md5sum(PRE.sAMAccountName.POST,15) PRE=a874f4u POST=ea748tyoal MAIL=join(DOT,givenName,sn).DOMAIN

md5sum is a function using Digest::MD5::md5_hex.

With a subroutine generating subroutine using a regex, the above values are converted into the following hash:

%h = ( DOMAIN => '@example.com', USER => sub { $r->{sAMAcountName} . $c->{DOMAIN} }, PASS => sub { md5sum($c->{PRE} . $r->{sAMAcountName} . $c->{POST +}, 15) }, MAIL => sub { join('.', $r->{givenName}, $r->{sn}).$c->{DOMAIN} +}, );

where $r is the current record, and $c is the hash representing the INI files content.

So far, so good. Retrieving a hash value consisting in a subroutine is done with

$out{$key} = $h{$key}->();

but that dies if the value slot holds a scalar. Yes, I could iterate over the keys using ref or such, but I would rather want to say

%out = %h;

and have a magic hash %h which encapsulates all that logic and "knows" what to deliver.

Relief

A tied hash. update: - which is a subroutine factory.

<update>
...due to BrowserUk's immediate reaction below: this is overkill to just execute CODE in the value slots of a hash, of course. Well, the itch was the starting point. The code generation and closure bits (see EXAMPLE in the pod) tell what it might be useful for: currying, building dispatch tables, ... - I have to play more with this, yet.
</update>

package Sub::Hash; use strict; use warnings; use Tie::Hash; our @ISA = qw(Tie::ExtraHash); our $VERSION = '0.01'; sub TIEHASH { my $class = shift; my $sub = shift; if(ref $sub) { $sub =~ /\bCODE\(/ or die "$class\::TIEHASH: not a code reference: '$_[0]', a +borted"; } else { my $pack = caller; my $s = $pack->can($sub) or die "Can't find subroutine \&$pack\::$sub, aborted"; $sub = $s; } my $storage = bless [{}, {}, $sub, map{\$_} @_], $class; $storage; } sub FETCH { my $self = shift; my $shash = $self->[0]; my $vhash = $self->[1]; my $key = shift; my $value = $vhash->{$key}; ref( $shash->{$key} ) =~ /\bCODE\b/ ? $shash->{$key}->($key, $value, $self, map{$$_}@{$self}[3..$# +$self]) : $value; } sub STORE { my $self = shift; my $key = $_[0]; my $shash = $self->[0]; my $vhash = $self->[1]; my $gensub = $self->[2]; $vhash->{$key} = $_[1]; unless ($shash->{$key}) { my $sub = $gensub->(@_,$self, map{$$_}@{$self}[3..$#$self]); $shash->{$key} = $sub if ref $sub eq 'CODE'; } } # accessors for the instance members sub subhash { $_[0][0] }; sub valhash { $_[0][1] }; sub sub { $_[0][0]->{$_[1]} } sub value { if(exists $_[2]) { $_[0][1]->{$_[1]} = $_[2]; } $_[0][1]->{$_[1]}; } sub gensub { my $self = shift; if (@_) { my $sub = shift; if(ref $sub) { $sub =~ /\bCODE\b/ or die "not a CODE reference, aborted"; } else { my $pack = caller; my $s = $pack->can($sub) or die "Can't find subroutine \&$pack\::$sub, aborted" +; } $self->[2] = $sub; } $self->[2]; } sub params { my $self = shift; if (@_) { splice @$self, 3, $#$self, map {\$_} @_; } map {$$_} @$self[3..$#$self]; } 1; __END__ =head1 NAME Sub::Hash - tie generated subroutines to keys of a hash =head1 SYNOPSIS use Sub::Hash; tie %hash, Sub::Hash, qw(gensub), $additonal, $variable; sub gensub { my ($key,$value, $additional, $variable) = @_; ... return $code; } # also $gensub = sub { my ($key,$value, $additional, $variable) = @_; ... return $code; }; tie %hash, Sub::Hash, $gensub, $additonal, $variable; =head1 DESCRIPTION This package implements a hash table whose values are anonymous subrou +tines. These subroutines are generated by the subroutine passed to tie() as t +he third argument when a value is allocated for a key. This subroutine generati +ng subroutine gets the key/value pair as the first arguments, followed th +e object which is returned by tied(%hash) on the tied %hash, then by any additi +onal variables passed to tie() after the subroutine argument. Accessing a value results in execution of its associated subroutine, w +hich gets the same arguments as the the generating subroutine. If the subroutine + slot is empty, the content of the value slot is returned instead. While the scalar values in a hash slot may be changed like in any othe +r hash, the associated subroutine remains in place after generation (but see b +elow). Additional methods to access the components of the object returned by +the call to tie() or tied(%hash) are described below. =head1 CONSTRUCTOR The generating subroutine may be passed as an anonymous sub (or CODE r +eference) or as a literal to tie(), in which case the subroutine is looked up in + the caller's package. Additional variables are stored as references to the arguments of the constructor, which means that their reference count increases, and re +main in place even after their lexical scope ends. =head1 METHODS =over 4 =item subhash returns the anonumous hash holding the value subroutines =item valhash returns the anonymous hash holding the scalar values =item sub($key [,$sub]) gets or sets the anonymous subroutine stored under $key =item value($key [,$scalar]) gets or sets the scalar value stored in the hash under $key =item gensub Gets or sets the subroutine generating subroutine. =item params gets or sets the additional parameters passed to the tie() call. Setting them replaces the original parameters, so their reference count is decreased. =back =head1 EXAMPLE use Sub::Hash; my ($closed,$over) = qw(closed over); my $gensub = sub { my ($key,$value,$w,$x) = @_; return sub { my($key,$value,$y,$z) = @_; return "closed over:\n" . "- tied vars at creation: ('$w','$x')\n" . "- key = $key\n" . "modifyable:\n" . "- value = $value\n" . "- tied variables: ($y,$z)\n"; } }; my $obj = tie %hash, Sub::Hash, $gensub, $closed, $over; my $key = 'foo'; my $result = $hash{$key} = 'bar'; # generates the subroutine and e +xecutes it print $result,"-----\n"; ($closed,$over) = ('open','til late'); $result = $hash{$key}; # executes sub stored under $key, # passing it ($key,$value,$obj,$cl +osed,$over) print $result,"-----\n"; $result = $hash{$key} = 'quux'; print $result,"-----\n"; __END__ Results of the above code: closed over: - tied vars at creation: ('closed','over') - key = foo modifyable: - value = bar - tied variables: (closed,over) ----- closed over: - tied vars at creation: ('closed','over') - key = foo modifyable: - value = bar - tied variables: (open,til late) ----- closed over: - tied vars at creation: ('closed','over') - key = foo modifyable: - value = quux - tied variables: (open,til late) ----- =head1 SEE ALSO Tie::Hash, perltie, Tie::Sub =head1 AUTHOR Georg Moritz, E<lt>shmem@cpan.orgE<gt> =head1 COPYRIGHT AND LICENSE Copyright (C) 2015 by Georg Moritz This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut

perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Replies are listed 'Best First'.
Re: tied hash for data munging
by BrowserUk (Patriarch) on Jul 11, 2015 at 12:42 UTC
    Yes, I could iterate over the keys using ref or such, but I would rather want to say: %out = %h;

    Immediate reaction: That's an awful lot of magical code to substitute for: %out = map{ ( $_ => ref( $h{$_} ? $h{$_}->() : $h{$_} ) } keys %h;

    Maybe I'm missing something?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
      Maybe I'm missing something?

      Hell yes - the YAGNI part of it, which is the code generation, and the variable binding (which can be used for closures in the generating sub):

      my $config = Config::Tiny->read($inifile); for my $key (keys %$config){ my $section = $config->{$key}; my $record; tie my %h, 'Sub::Hash', $gensub, $record; # $record bound %h = %{$section}; # transforms all INI file values # from this section into CODE refs my $result = getData( delete $h{SOURCE} ); for $record (values %$result) { write_csv( my %out = %h ); } }

      This replaces an awful lot of code - which, I admit, could also be hidden in subroutines.

      update:

      Well, not so much really. Of course the subroutine table could also be set up as

      %h = map { $_, $gensub->( $section->{$_} ) } keys %$section;

      But the point of the whole thing is to have all that encapsulated in a single tied hash, which is arguably not so great an a good idea.

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: tied hash for data munging
by ww (Archbishop) on Jul 11, 2015 at 17:54 UTC
    "... yet another strange perl delirium?"

    More like "makes me deliriously happy!

    Update: In my delirium, I couldn't even spell correctly. Corrected (I think) And yes, re name (below), IMO "Tie::Closure::Generated" does seem an improvement.

    But, for a 'fair and balanced' [   :-(   ] approach to commenting, another name (possibly including "TIE" or possibly one that mentions multi-format-input for a single output format -- (yes, I'll try to think up something on that order) might set this apart from other modules invoking the word "Hash" or "Sub."


    Come, let us reason together: Spirit of the Monastery
A reply falls below the community's threshold of quality. You may see it by logging in.