bsb has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to fill out a general nested structure from flat and orderless name=value pairs of CGI params. I want the name to be split on '.' and used as multilevel hash keys. The values are scalars or arrays:
my $cgi = CGI->new( 'a.b.c=3&a.b.c=4&x.y=4' ); # convert somehow to: my $args = { a => { b => { c => [3,4] }, x => { y => 4} };
I realize that adding a=1 to the above list would cause problems. As long as I the user can't cause a "not a HASH ref" error I'm not too concerned at this stage.

I wrote some cruddy code for this but couldn't think of a nice way to access and assign to the nested hash. Any suggestions?

Brad

PS. No $;='.'

Replies are listed 'Best First'.
Re: Initializing Hashes of Hashes (of Hashes)
by adrianh (Chancellor) on Apr 28, 2003 at 08:50 UTC

    You might want to look at CGI::State. Looks like it does exactly what you want.

Re: Initializing Hashes of Hashes (of Hashes)
by bart (Canon) on Apr 28, 2003 at 09:00 UTC
    Well I could think of a quick way using eval, but I won't. (update hint: it'd involve replacing the dots with "}{".) So the only other option is to "walk the chain". Note that your concept is intrinsically unsafe. If you add a parameter with name 'a.b', the whole thing will blow up.
    use CGI; my $cgi = CGI->new( 'a.b.c=3&a.b.c=4&x.y=4' ); my $args = {}; my @names = $cgi->param; foreach my $name (@names) { my $p = $args; my @keys = split /\./, $name; for my $i (0 .. $#keys-1) { $p = $p->{$keys[$i]} ||= {}; } my @values = $cgi->param($name); $p->{$keys[-1]} = @values==1 ? $values[0] : \@values; } use Data::Dumper; $Data::Dumper::Indent = 1; print Dumper $args;
    Result:
    $VAR1 = { 'x' => { 'y' => '4' }, 'a' => { 'b' => { 'c' => [ '3', '4' ] } } };

    Update For the cases where this would blow up, there is no solution. So the only reasonable thing to do is to catch this if it is bound to happen, for example by checking the reference type using ref, and complain in a nice, preferably opaque way.

Re: Initializing Hashes of Hashes (of Hashes)
by PodMaster (Abbot) on Apr 28, 2003 at 08:48 UTC
    Fist off, that doesn't compile.
    Secondly, how did you solve it?
    Thirdly, what if you have A.B.C=3&A.B=3, what happens then?
    Your strategy is flawed, I can't fathom why you'd wanna do something like this.

    Also, how could a user possibly cause "not a HASH ref" error? He's not writing any code.

    #!/usr/bin/perl use strict; use CGI; my $cgi = CGI->new( 'a.b.c=3&a.b.c=4&x.y=4' ); # convert somehow to: my $args = { 'a' => { b => { c => [3,4] }, }, 'x' => { 'y' => 4, }, }; use Data::Dumper; warn Dumper $args; die Dumper Shaz( $cgi ); #sub Shaz { "goes here -- you show me yours, i'll show you mine" } sub Shaz { my( $cgi ) = @_; my $shazbah = {}; for my $i ( $cgi->param() ) { my @bits = split /\./, $i; my $shazbot = $shazbah; for my $bit( 0..$#bits ) { if( $bit == $#bits ) { if( exists $shazbot->{$bits[$bit]} ) { if( ref $shazbot->{$bits[$bit]} ne 'ARRAY'){ $shazbot = $shazbot->{$bits[$bit]} = []; } else { $shazbot = $shazbot->{$bits[$bit]}; } } else { $shazbot = $shazbot->{$bits[$bit]} = []; } last; } if( exists $shazbot->{$bits[$bit]} ) { $shazbot = $shazbot->{$bits[$bit]}; } else { $shazbot = $shazbot->{$bits[$bit]} = {}; } } push @$shazbot, $cgi->param($i); } return $shazbah; } __END__ # I'd say this is pretty close $VAR1 = { 'x' => { 'y' => 4 }, 'a' => { 'b' => { 'c' => [ 3, 4 ] } } }; $VAR1 = { 'x' => { 'y' => [ '4' ] }, 'a' => { 'b' => { 'c' => [ '3', '4' ] } } };
    update: Updated Shaz with actual code ;)(please note I did this in a single pass)


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Re: compiling - I guess I took line breaking literally.

      Re: my solution - Posted in reply to myself later

      Re: overwriting - It breaks, yes, but the input is validated at a lower level. It does leave me open to typo bugs

      Re: bad strategy - maybe, convince me...

      Re: how could a user possibly cause "not a HASH ref" error -
      The situation I had in mind was a query like 'a=1&a=2&a.b=3'

        The situation I had in mind was a query like 'a=1&a=2&a.b=3' where an implementation may build { a => [1,2] } then try to set $args->{a}{b}=3
        That's why functions like ref and exists exist ;D Test and make a decision (clobber or skip).

        I would try to convince you this is bad strategy, but first you must list the reasons you want such a datastructure.
        It might not be bad strategy, but from where i'm standing it doesn't look that way.
        What you really need to do is fully define what should happen (basically draft up a full spec), so things like "not a HASH ref" are not an issue.

        Actually, nevermind the reasons you want this, just complete the spec (a detailed description of design criteria), and you should be fine (given that you implement it correctly ;D).


        MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
        I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
        ** The Third rule of perl club is a statement of fact: pod is sexy.

Re: Initializing Hashes of Hashes (of Hashes)
by robartes (Priest) on Apr 28, 2003 at 09:01 UTC
    Right, first off the bat - heed PodMaster's advice. What you are doing here does not make much sense: leaf nodes with the same path leading to them as earlier nodes will overwrite those nodes. Also, I don't know how CGI is supposed to handle a.b.c=1&a.b.c=2, but CGI.pm concatenates the values, which does not seem to be what you want.

    That said, here's some code that does what you ask, although it does not make much sense :) :

    use warnings; use strict; use CGI; use Data::Dumper; my $cgi = CGI->new( 'a.b.c=3&d.e.f.g=5&x.y=4' ); my %params=$cgi->Vars(); my %hash; while (my ($param,$value) = each %params ) { my $hashref=fill($param,$value); my $key=(keys %$hashref )[0]; $hash{$key}=$hashref->{$key}; } print Dumper(\%hash); sub fill { my $node=shift; my $value=shift; my @nodetree=split /\./, $node; if (scalar @nodetree == 1) { return { $nodetree[0] => $value } }; my $trunk=shift @nodetree; $node=join ".", @nodetree; return { $trunk => fill($node,$value) }; } __END__ $VAR1 = { 'a' => { 'b' => { 'c' => '3' } }, 'd' => { 'e' => { 'f' => { 'g' => '5' } } }, 'x' => { 'y' => '4' } };

    Update: Scratch that remark wrg multiple values for the same param. My conclusion was based on an incorrect test of mine.

    CU
    Robartes-

Re: Initializing Hashes of Hashes (of Hashes)
by shotgunefx (Parson) on Apr 28, 2003 at 10:07 UTC
    I won't debate the logic of what your doing. I did something along the same lines for an embedded language where I didn't care about "strictness" of variable references. Though you may want to reconsider.

    The code below expects that keys components m/\.\d+\./ are array indexes. Letter keys are hash keys. Beware that perl arrays are not sparse so a.1000000000.f might ruin your day so $MAX_ARRAY_INDEX sets an upper limit. Beware that if a.b.c is a hash and you then say a.b.1, this will happily discard that hash at a.b.c and created an array at a.b

    This was written in haste and from memory so user beware, also your example would have to chance slightly. a.b.c values would have to become a.b.c.1,a.b.c.2 etc. or just assign at once getset_key(\%form,'a.b.c',\@values).

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %form = ( a=>5,); print Dumper (\%form); print get_key(\%form,'a.b.c'),"\n" ; # Doesn't vivify print Dumper (\%form); getset_key(\%form,'a.b.c', 55); # Set $form{a}{b}{c} to 55 print Dumper (\%form); getset_key(\%form,'a.a.c.d', "umkay"); #Set $form{a}{a}{c}{d} to umkay print Dumper (\%form); getset_key(\%form,'a.b.c.d', 15); #Set $form{a}{b}{c}{d} to 15, bye c= +55! print Dumper (\%form); getset_key(\%form,'a.b.1', 100); # Sets $form{a}{b}[1] = 100 print Dumper (\%form); getset_key(\%form,'a.b.11111111', 10); # Array too big, bails print Dumper (\%form); BEGIN { my $MAX_ARRAY_INDEX = 100; sub get_key { # Does not vivify non existant keys my $h = shift; my $k = shift; die "You need to pass a reference to get_key!" unless ref($h); die "No key specified!" unless defined $k; my @p = split(/\./,$k); die "An index is too big!: ".join(', ',@p) if grep { $_=~m/^\d+ +$/ && $_ > $MAX_ARRAY_INDEX} @p; my $v = get_avhv($h,@p); } sub getset_key { # Does vivify non existant keys my $h = shift; my $k = shift; my $v = shift; die "You need to pass a reference to get_key!" unless ref($h); die "No key specified!" unless defined $k; my @p = split(/\./,$k); die "An index is too big!: ".join(', ',@p) if grep { $_=~m/^\d+$ +/ and $_ > $MAX_ARRAY_INDEX} @p; vivify_avhv($h,@p,$v); } sub vivify_avhv { my $h = \shift; my $v = pop; local $_; while (@_){ if ($_[0]=~/^\d+$/){ $$h = [] unless UNIVERSAL::isa( $$h, "ARRAY" ); $h = \$$h->[shift ()]; }else{ $$h = { } unless UNIVERSAL::isa( $$h, "HASH" ); $h = \$$h->{shift()}; } } $$h = $v; return $h; } ################################################# sub get_avhv { my $h = \shift; local $_; while (@_){ if ($_[0]=~/^\d+$/){ return unless UNIVERSAL::isa( $$h, "ARRAY" ); return if $_[0] >= @{$$h} ; # Prevent autoviv $h = \$$h->[shift ()]; }else{ return unless UNIVERSAL::isa( $$h, "HASH" ); return if ! exists $$h->{$_[0]}; # Prevent autoviv $h = \$$h->{shift()}; } } return $$h; } }


    -Lee

    "To be civilized is to deny one's nature."
Re: Initializing Hashes of Hashes (of Hashes)
by benn (Vicar) on Apr 28, 2003 at 10:22 UTC
    This looks a little like an 'intermediate' problem/solution - you have the CGI params, and later on you want to access them in some nice easy way. Although this multi-level hash business looks fun, maybe there are other solutions - if you described your task a little further, some alternative approaches may be forthcoming.

    Cheers,
    Ben
Re: Initializing Hashes of Hashes (of Hashes)
by bsb (Priest) on Apr 29, 2003 at 00:44 UTC
    Thanks all for the suggestions and apologies for the shonky question. I was hurrying to leave work and didn't have time to explain/contextualize it all (see below).

    The reason I asked in the first place was that my 3 code efforts to fill out the hash all seemed inelegant and I hoped to be shown a better way. I did it iteratively like bart does in his response, handling the last segment separately. I don't like the recursive option given. CGI::State's method is pretty wacky, he loves the hook ops.

    I still suspect I'm missing a better way though...

    For those interested my code as I left it was:

    sub cgi2args { my $cgi = shift; #my $cgi = CGI->new('a.b.c=3&a.b.c=4&x.y=4'); # a & a.b ? warn my $args = {}; for my $name ($cgi->param) { # could warn unrecognized for html typos #for my $name (grep $interface_re, $cgi->param) { my @segments = split /\./, $name; my $last_seg = pop @segments; my $a = $args; for (@segments) { $a->{$_} ||= {}; # XXX defined not true $a = $a->{$_}; } my @values = $cgi->param($name); $a->{$last_seg} = @values == 1 ? $values[0] : \@values; } return $args; }
    Context
    I have some classes for validating and storing data along with other scaffolding and am attempting to keep most of them unaware of CGI so that I can reuse them in non-web environments. This was a path (blind alley?) I wandered down on the way.

    The two goals are:
    1) Turn form data into complex data structures, generically enough that the backend can remain ignorant of the web.
    2) Separate the parameters for different widgets. A widget sees all and only it's parameters.

    CGI::State seems to do what is required.

    Comment on Criticisms
    All fairly valid. I knew the question was poorly framed and that the method was flawed. Nevertheless, I got some useful answers. Thanks.

    Brad

      (hmm, talking to myself..)
      An updated and safer (I hope) version.
      I stole some ideas from CGI::State and a piece of TT2's variable naming.