This meditation is about a tied hash package, and how it came into existence. I am still meditating whether this is too obscure, or whether its goal is better achieved using some other technique; is it worth being uploaded to CPAN as yet another strange perl delirium? is the name ok? Any suggestions, review, critics are welcome. Thanks for your time.
Over 100 poorly performing scripts written in some BASIC dialect for exactly the same purpose (read source records, transform them, write target records), sporting hardcoded parameters and different output assembling code, proliferating with each new customer (copy over, twiddle, tweak).
Perl to the rescue to do the data gathering and munging, and write a unified import CSV to be fed into that dratted basic script - one for all. Parameters and data transforming procedures should be kept separate, in a format editable by non-perlers. I choose INI file style, which fitted both that BASIC dialect and perl:
SCALAR=Some value LIST=Foo,Bar,Baz HASH=Foo:Bar,Baz:Quux
Straight forward. What about the data transforming rules? Since these are concatenations of values from the input record - conveniently present as a hash - and the output of some functions munging those values, these are something that could easily be transformed into subroutines:
DOMAIN=@example.com USER=sAMAccountName.DOMAIN PASS=md5sum(PRE.sAMAccountName.POST,15) PRE=a874f4u POST=ea748tyoal MAIL=join(DOT,givenName,sn).DOMAIN
md5sum is a function using Digest::MD5::md5_hex.
With a subroutine generating subroutine using a regex, the above values are converted into the following hash:
%h = ( DOMAIN => '@example.com', USER => sub { $r->{sAMAcountName} . $c->{DOMAIN} }, PASS => sub { md5sum($c->{PRE} . $r->{sAMAcountName} . $c->{POST +}, 15) }, MAIL => sub { join('.', $r->{givenName}, $r->{sn}).$c->{DOMAIN} +}, );
where $r is the current record, and $c is the hash representing the INI files content.
So far, so good. Retrieving a hash value consisting in a subroutine is done with
$out{$key} = $h{$key}->();
but that dies if the value slot holds a scalar. Yes, I could iterate over the keys using ref or such, but I would rather want to say
%out = %h;
and have a magic hash %h which encapsulates all that logic and "knows" what to deliver.
A tied hash. update: - which is a subroutine factory.
<update>
...due to BrowserUk's immediate reaction below: this is overkill to just execute CODE in the value slots of a hash, of course. Well, the itch was the starting point. The code generation and closure bits (see EXAMPLE in the pod) tell what it might be useful for: currying, building dispatch tables, ... - I have to play more with this, yet.
</update>
package Sub::Hash; use strict; use warnings; use Tie::Hash; our @ISA = qw(Tie::ExtraHash); our $VERSION = '0.01'; sub TIEHASH { my $class = shift; my $sub = shift; if(ref $sub) { $sub =~ /\bCODE\(/ or die "$class\::TIEHASH: not a code reference: '$_[0]', a +borted"; } else { my $pack = caller; my $s = $pack->can($sub) or die "Can't find subroutine \&$pack\::$sub, aborted"; $sub = $s; } my $storage = bless [{}, {}, $sub, map{\$_} @_], $class; $storage; } sub FETCH { my $self = shift; my $shash = $self->[0]; my $vhash = $self->[1]; my $key = shift; my $value = $vhash->{$key}; ref( $shash->{$key} ) =~ /\bCODE\b/ ? $shash->{$key}->($key, $value, $self, map{$$_}@{$self}[3..$# +$self]) : $value; } sub STORE { my $self = shift; my $key = $_[0]; my $shash = $self->[0]; my $vhash = $self->[1]; my $gensub = $self->[2]; $vhash->{$key} = $_[1]; unless ($shash->{$key}) { my $sub = $gensub->(@_,$self, map{$$_}@{$self}[3..$#$self]); $shash->{$key} = $sub if ref $sub eq 'CODE'; } } # accessors for the instance members sub subhash { $_[0][0] }; sub valhash { $_[0][1] }; sub sub { $_[0][0]->{$_[1]} } sub value { if(exists $_[2]) { $_[0][1]->{$_[1]} = $_[2]; } $_[0][1]->{$_[1]}; } sub gensub { my $self = shift; if (@_) { my $sub = shift; if(ref $sub) { $sub =~ /\bCODE\b/ or die "not a CODE reference, aborted"; } else { my $pack = caller; my $s = $pack->can($sub) or die "Can't find subroutine \&$pack\::$sub, aborted" +; } $self->[2] = $sub; } $self->[2]; } sub params { my $self = shift; if (@_) { splice @$self, 3, $#$self, map {\$_} @_; } map {$$_} @$self[3..$#$self]; } 1; __END__ =head1 NAME Sub::Hash - tie generated subroutines to keys of a hash =head1 SYNOPSIS use Sub::Hash; tie %hash, Sub::Hash, qw(gensub), $additonal, $variable; sub gensub { my ($key,$value, $additional, $variable) = @_; ... return $code; } # also $gensub = sub { my ($key,$value, $additional, $variable) = @_; ... return $code; }; tie %hash, Sub::Hash, $gensub, $additonal, $variable; =head1 DESCRIPTION This package implements a hash table whose values are anonymous subrou +tines. These subroutines are generated by the subroutine passed to tie() as t +he third argument when a value is allocated for a key. This subroutine generati +ng subroutine gets the key/value pair as the first arguments, followed th +e object which is returned by tied(%hash) on the tied %hash, then by any additi +onal variables passed to tie() after the subroutine argument. Accessing a value results in execution of its associated subroutine, w +hich gets the same arguments as the the generating subroutine. If the subroutine + slot is empty, the content of the value slot is returned instead. While the scalar values in a hash slot may be changed like in any othe +r hash, the associated subroutine remains in place after generation (but see b +elow). Additional methods to access the components of the object returned by +the call to tie() or tied(%hash) are described below. =head1 CONSTRUCTOR The generating subroutine may be passed as an anonymous sub (or CODE r +eference) or as a literal to tie(), in which case the subroutine is looked up in + the caller's package. Additional variables are stored as references to the arguments of the constructor, which means that their reference count increases, and re +main in place even after their lexical scope ends. =head1 METHODS =over 4 =item subhash returns the anonumous hash holding the value subroutines =item valhash returns the anonymous hash holding the scalar values =item sub($key [,$sub]) gets or sets the anonymous subroutine stored under $key =item value($key [,$scalar]) gets or sets the scalar value stored in the hash under $key =item gensub Gets or sets the subroutine generating subroutine. =item params gets or sets the additional parameters passed to the tie() call. Setting them replaces the original parameters, so their reference count is decreased. =back =head1 EXAMPLE use Sub::Hash; my ($closed,$over) = qw(closed over); my $gensub = sub { my ($key,$value,$w,$x) = @_; return sub { my($key,$value,$y,$z) = @_; return "closed over:\n" . "- tied vars at creation: ('$w','$x')\n" . "- key = $key\n" . "modifyable:\n" . "- value = $value\n" . "- tied variables: ($y,$z)\n"; } }; my $obj = tie %hash, Sub::Hash, $gensub, $closed, $over; my $key = 'foo'; my $result = $hash{$key} = 'bar'; # generates the subroutine and e +xecutes it print $result,"-----\n"; ($closed,$over) = ('open','til late'); $result = $hash{$key}; # executes sub stored under $key, # passing it ($key,$value,$obj,$cl +osed,$over) print $result,"-----\n"; $result = $hash{$key} = 'quux'; print $result,"-----\n"; __END__ Results of the above code: closed over: - tied vars at creation: ('closed','over') - key = foo modifyable: - value = bar - tied variables: (closed,over) ----- closed over: - tied vars at creation: ('closed','over') - key = foo modifyable: - value = bar - tied variables: (open,til late) ----- closed over: - tied vars at creation: ('closed','over') - key = foo modifyable: - value = quux - tied variables: (open,til late) ----- =head1 SEE ALSO Tie::Hash, perltie, Tie::Sub =head1 AUTHOR Georg Moritz, E<lt>shmem@cpan.orgE<gt> =head1 COPYRIGHT AND LICENSE Copyright (C) 2015 by Georg Moritz This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: tied hash for data munging
by BrowserUk (Patriarch) on Jul 11, 2015 at 12:42 UTC | |
by shmem (Chancellor) on Jul 11, 2015 at 13:04 UTC | |
|
Re: tied hash for data munging
by ww (Archbishop) on Jul 11, 2015 at 17:54 UTC | |
by shmem (Chancellor) on Jul 11, 2015 at 18:14 UTC | |
| A reply falls below the community's threshold of quality. You may see it by logging in. |