For certain types of programming, generally things that I expect Perl to be great for, I encounter one big annoyance: complex data structure access. Specifically, consider a case where you have a deeply nested structure. The easiest way to implement this in module-less Perl is to use nested hashes-of-hashes. But then I find myself writing code like (variable names shortened to avoid wrapping):
$cost = $h->{Locations}{$location}{Buildings}{$b}{cost};
The whole {a}{b}{c}{d}{e} thing drives me crazy, especially when compared to C's a.b.c.d style. (Of course, this example would be tough in C: Locations[0].Buildings[3].cost, perhaps? Not quite the same.) I know of several Perl modules to help with this sort of thing, but they all transform it into:
$cost = $h->Locations->$location->Buildings->$b->cost;
Which isn't much of an improvement, in my eyes -- sure, you get rid of the curly brackets, but instead you deceptively make things look like methods, and I don't want to think of every single layer in my data structure as a data accessor method. It's just data, dammit!

So I wrote a module to experiment with a different style:

$cost = $h->{"Locations.$location.Buildings.$b.cost"};
Not really very different. But somehow it feels much more satisfying to me. Just for kicks, I also implemented a method each() that, rather than doing the usual operation across one level of the HoH, it does a recursive conversion to dotted keys:
while (my ($key, $value) = $h->each) { print "$key => $value\n"; } OUTPUT: Locations.Earth.Buildings.PowerPlant.cost => 7 Locations.Earth.Buildings.PowerPlant.size => 4 Locations.Earth.Coordinates => 12.0,72.4 Locations.Squaffle.Buildings.SaltRefinery.cost => 2 . . .
The implementation is a package that you both bless and tie the constituent hash refs to, and that manufactures new objects of the same sort when needed for FETCHes and STOREs.

I have the usual naming problem (the working name is the clearly-not-acceptable Struct.pm; maybe Hash::Dot? It's kind of like that website, Slash::Dot.), but I'm wondering if this has already been done. It's not a particularly brilliant idea, and probably many of you will find it revolting. In my admittedly cursory scan of search.cpan.org, I couldn't find anything, but it's a tough one to search for. Has anyone heard of such a thing?

Additionally, who else runs into this problem? Is there some alternative approach that I should be considering? I really don't want to use a database for this -- it's a huge amount of overhead for simply wanting to be able to use nested data structures a little more comfortably. Although if it were transparent and fast, I wouldn't mind.

Considered by radiantmatrix: "Retitle 'C-style dot notation for complex hash access' for better searching" (Keep/Edit/Delete vote = 12/18/0)
Unconsidered by davido: No clear vote consensus after two days.

Replies are listed 'Best First'.
Re: Dotted hash access
by brian_d_foy (Abbot) on Nov 25, 2004 at 06:19 UTC

    You seem to be doing a lot of extra work.

    $cost = $h->{$location}{$building};

    If that doesn't work for you, create a little class to make the hash an object and give it some accessors.

    As for your new method of doing things, it's been done before and never really caught on. It's not easy to do either. In your example,

    $cost = $h->{"Locations.$location.Buildings.$b.cost"};

    You need to restrict the values of $location and $b so they don't contain the separator character. Not only do you have to program it, but you have to document it and tell users they can't use it. Or, let users escape that character, but then your parsing is more difficult. There's a lot of ways for this to go wrong. There is a reason Perl doesn't do this anymore. Your way is the Perl4 way of creating multi-dimensional hashes: see the documentation of $; in perlvar (it tells you not to do this).

    The way to make the conventional syntax more comfortable is to use it more.

    --
    brian d foy <bdfoy@cpan.org>
      I can buy your arguments if we're talking about completely general nested data structure access, but that's not what I'm after. I really do have a deeply nested structure, which I am completely specifying -- so I have no worries about $location or $b containing the separator, because they're field names. My values could very well have weird characters in them, but I'll never have anything unexpected in the "path" of a value. (True, a field name like $location could easily be coming from an external source, but I have to scrub them for sanity well before reaching this code anyway.)

      $cost = $h->{$location}{$building} wouldn't work for me. The structure contains many more things than just Locations, and a location has much more data associated with it than a set of Buildings. So I don't want to accidentally get the wrong value back if I have a location name that happens to coincide with some other field name. (Perhaps I should give a more detailed dump of a data set?)

      My approach differs from Perl4's because mine can be used hierarchically, unlike $;. So I am free to do

      $locinfo = $h->{"Locations.$location"}; count_totals($locinfo->{"Resources"}); compute_upkeep($locinfo->{"Buildings"});
      or whatever.
        If that's the case, why don't have have a class that represents and provides access to the data structure? You can make the interface very simple without creating a bunch of new syntax that you will have to explain to everyone who looks at the code. Try a class setup: Thingy HAS Buildings HAS Locations: You leave everything in your big hash, but when you need something, you ask for it through a method which gives you a reference to that branch of the data structure.
        $location = $thingy->get_location( $l );
        That method looks at the big data structure, pulls out the reference which is the value for the key $l (which has no restrictions on its characters), blesses it as the mini-class Location, and returns it. It's the same old reference, there is no copy, but you can call methods on it. Not only that, it's easy to see what it's doing because there isn't a lot of things happening at that level.
        $building = $location->get_building( $b );
        That's the same thing, called on the previous mini-object. All the data is still in the big data structure, but you have this hook into it because the reference $building is blessed into the Buidling class. Again, it's the same old reference that's in the data strucutre, but it now knows how to respond to method calls. Or, you can do this in one step.
        $building = $thingy->get_location( $l )->get_building( $b ); #or $building = $thingy->get_building_by_location( $l, $b );
        Or
        $cost = $thingy->get_location( $l )->get_building( $b )->cost; $totals = $location->get_totals; $upkeep = $building->get_upkeep;
        If you want to iterate through everything:
        foreach my $l ( $thingy->all_locations ) { foreacn my $b ( $b->all_buildings ) { $b->set_cost( $b->cost + 1 ); } }
        You can define all sorts of other iterators, visitors, and cool things. You don't have to know anything about the data structure. You have a lot of flexibility with this approach, and it uses vanilla Perl syntax that people can read about it in books and in the documentation. You don't have to create any new way to do thing, which means you don't have to create new logic or new bugs. If you want iterators, you can create those yourself inside the class. I talk about all sorts of these things in The Perl Review 0.5.
        while( my $location = $thingy->next_location ) { ... }
        So don't create a new dialect, which just adds to the complexity of your code. You should code not so you understand it, but so other people will understand. :)
        --
        brian d foy <bdfoy@cpan.org>
Re: Dotted hash access
by diotalevi (Canon) on Nov 24, 2004 at 23:50 UTC
    See also XPath. Perhaps you just want something simple like Locations/Earth/Buildings/PowerPlant/cost but if you use real XPath, more powerful expressions are possible as well. Consider //PowerPlant which would produce every PowerPlant node.
      That's not a bad idea. I've used XPath -- well, actually I haven't, but I implemented something very similar to it (and borrowed as much syntax and semantics as I could in the process.) It may be overkill here, but it seems like a nice middle ground between straight Perl data structure access and the generality of SQL.
Re: Dotted hash access
by Zaxo (Archbishop) on Nov 25, 2004 at 03:47 UTC

    The advantage of perl's notation is that it is consistent and, with practice, it tells you everything about the structure above the data.

    The usual way to make that briefer would be to make a real data accessor,

    sub cost { my ($self, $location, $b) = @_; $self->{Locations}{$location}{Buildings}{$b}{cost}; } # . . . my $cost = $h->cost($location, $b)
    where the cost method is defined in the namespace of the $h object.

    It probably would help to break the big data structure into subobjects to which the whole has a 'has-a' relation.

    I agree that big deep data structures full of bits of everything are awkward to handle. I don't think notation is the real problem. It's a design matter.

    After Compline,
    Zaxo

Re: Dotted hash access
by castaway (Parson) on Nov 25, 2004 at 06:34 UTC
    As you pointed out yourself, it looks just as crazy in C. In C, and probably several other languages, people tend not to write such deep structures (in my experience anyway), if you don't like it in Perl, why do it there then? Whats stopping you from having a hash called Locations, in which you store the location names (ids, or whatever they were), and another hash called Buildings in which you store the cost? (Plus a third one that tells you which Building is at which Location).

    Oops, strange, that's how I'd do it were I using a database..

    Nothing says you have to put all your data in one huge data structure. Your idea sounds like a nice one, until you realise the restrictions, and that its an elaborate way of getting around not using a DB or a DB-like structure. Theres no need to use a huge DB just use SQLite or something?

    C.

      I guess I really should have described the data set better. Every level of that tree is fully populated. I can't have a separate table of buildings, because those buildings are specific to the given location. In database terms, both the location and the building name are in the primary key. (I do have a separate table of buildings, but that's in a separate data structure and describes the features they have in common.) I guess "cost" was a bad example to use, since it's unclear why it would vary by location. My bad.

      I guess what I'm saying is that I periodically run across cases where the data is fundamentally deeply nested, and I need to start over from a fairly high level frequently (so I can't just grab the nested hash out and pass it into the other functions, avoiding long lookup chains.) Admittedly, it's not a frequent occurrence, but configuration files and simulations seem to be typical examples.

        I understand what you're trying to say, I still can't think of an actual use that wouldn't lend itself to splitting in this manner. It's just a design issue.

        C.

Re: Dotted hash access
by dimar (Curate) on Nov 25, 2004 at 00:10 UTC
    Additionally, who else runs into this problem? Is there some alternative approach that I should be considering?

    It depends on how you define "problem" ... I would suspect you aren't the only one who has had this consideration, but there may not be enough "momentum" behind this idea to motivate or inspire something different. I asked a similar question a while back ... Perl complex data structure ... how to get more flexible access? ... outside of using XPATH, or rolling your own code, or looking into the suggestions in the thread cited previously, I have not been able to find exactly what you are asking for.

    In other words ... me too!

Re: Dotted hash access
by hardburn (Abbot) on Nov 25, 2004 at 04:06 UTC

    It's just syntax. Nothing I would spend too much effort on.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: Dotted hash access
by Jenda (Abbot) on Nov 25, 2004 at 16:35 UTC

    Please don't. At best you save one character between each level of the structure. What you loose is clarity and speed. Think about the poor guy who inherits your programs. He'll have no idea what the heck are you doing there with those dots. Especialy since dots already do have a meaning for strings. Dot is the concatenation operator, and suddenly it should be a separator? And how come this hash behaves in this strange way?

    Jenda
    We'd like to help you learn to help yourself
    Look around you, all you see are sympathetic eyes
    Stroll around the grounds until you feel at home
       -- P. Simon in Mrs. Robinson

Re: Dotted hash access
by Ctrl-z (Friar) on Nov 25, 2004 at 19:22 UTC
    You may be interested in some of the replies I got to a similar question a while back.



    time was, I could move my arms like a bird and...
Re: Dotted hash access
by Juerd (Abbot) on Nov 25, 2004 at 21:51 UTC

    As you can read in another node by me in this thread, I also dislike the typing exercise that one needs to practice every time a deep HoHoHoH element is needed. However, I don't like joining all keys together, because that makes iterating or assigning a reference to a deeper hash hard, or impossible, depending on the time available for hacking up ugly solutions.

    Still, if I would join keys together, I'd do so with Perl's own built-in mechanism for that. Supply a list as a hash key and perl automatically joins it with $;. It'd be nice if there was an interpolating qw. :)

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      I'm not actually joining anything; I'm using blessed-hashes-of-blessed-hashes. (I replied this privately a minute ago, but then it sparked an idea.)

      But that gives me an idea, or perhaps it's what you already meant: it would be much better to use the same underlying implementation, but switch to using $; rather than a period as a separator, so that the example would be:

      $cost = $h->{'Locations',$location,'Buildings',$building,'cost'};

      No new syntax that way, although it does use an unfamiliar one in these post-Perl4 days. But also no smushing of keys together into one string, even if it's only temporary.

      And the more I think about it, the more it looks like this is what you meant -- since my immediate reaction to typing in the example was that an interpolating qw would be really nice! But your point about it being difficult to iterate over or assign to a deeper level isn't a problem with my current implementation.

      I think I'll go change my code to take an optional separator string parameter, defaulting to $;, so that you can do it either way. I'm not sure yet which I'll use; the lack of interpolation with the $; approach defeats much of the benefit.

      As for Perl6 -- we could always add in more than one interpolating context. You'd still need syntax to select them, of course. How about

      $code = %h{''Locations $location Buildings $building cost''};
      or maybe
      $code = %h{''Locations'$location'Buildings'$building'cost''};
      Ironically, I think this is already possible in Parrot with multipart keys, using PIR's
      $P1['Locations';$S1;'Buildings';$S2;'cost']
      The current aggregates' code will pass on any leftover portions of a key to the aggregate it just retrieved.

        The more I read this thread the more I dont understand why you dont maintain a hash that is structured with only two levels, location and then building. Then your code looks like:

        my $code=$locations{$location}{$building}{code};

        Also something to keep in mind (although its not hugely critical) each deref takes time, each hash lookup takes time, each unique key takes space. So in some circumstances your dotted approach would result in considerably more memory being taken up by the keys. Not only that but determinisitc traversal of your dotted form of the tree would be quite expensive as compared to the non dotted form. Overall I wouldnt go this route unless i had really strong justification to do so. And style isnt a strong justification IMO :-)

        ---
        demerphq

Re: Dotted hash access
by zentara (Cardinal) on Nov 25, 2004 at 12:51 UTC
    Isn't this problem supposed to be corrected in Perl6?

    I'm not really a human, but I play one on earth. flash japh
      That depends on which part of the problem you're talking about. It would be easy to define a dialect of Perl 6 that allowed this, but there is as of yet no standard path notation other than
      %hash«Foo»«Bar»«Baz»
      If I were going to make a dialect, I'd probably throw in a
      use supersubscripts;
      which would let me write
      %hash«Foo/Bar/Baz»
      instead to mean the same thing. But it's not clear that something like that should be inflicted on everyone unless the syntax were less likely to be confusing to someone who really means
      %hash{'Foo/Bar/Baz'}
      to mean a single key with slashes in it. It would need a more distinctive prefix if we were to build it in, and unfortunately we're really low on bracket characters, even with the addition of «», which we've already found lots of uses for. (Some would say too many... :-)

      So the answer to your question is probably "no" for now...

      Isn't this problem supposed to be corrected in Perl6?

      For several reasons, the dot cannot be used safely for hash access. I proposed backticks for this purpose, but it's not going to happen, because many find it too ugly (since when is that reason to not do something in Perl?), and the powers that be have decided. See also http://groups.google.com/groups?selm=20040414121848.GJ3645%40c4.convolution.nl;

      Still, I think it elegantly solves the problem.

      $cost = $h->{Locations}{$location}{Buildings}{$b}{cost};
      would be written as
      $cost = $h`Locations`$location`Buildings`$b`cost;

      I want this for two reasons:

      1. Typing { and } repeatedly is hard, at least for my hands
      2. Typing {'key'} or «key» is even harder
      Realise that
      $cost = $h->{Locations}{$location}{Buildings}{$b}{cost};
      will be
      $cost = $h{'Locations'}{$location}{'Buildings'}{$b}{'cost'};
      or
      $cost = $h«Locations»{$location}«Buildings»{$b}«cost»;
      in Perl 6.

      IMO, the OP has a good point. Hash access is nice, and the syntax is certainly doable, but it gets tedious for accessing an deep element in a HoHoHoH. And even though many things are made much easier by Perl 6, this specific thing is IMHO made much worse.

      (Please, let's not start another "write your own grammar" subthread.)

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Dotted hash access
by TomDLux (Vicar) on Nov 25, 2004 at 17:34 UTC

    I like the idea of your extended each to iterate through deeply nested structures. It especially simplifies printing data, or makign minor alterations.

    More generally, though, my approach is to isolate each layer. Ideally, each level should be an array of objects or a hash of objects, which in turn is an array or hash of objects. Especially so if you have a number of operations carried out on the objects.If there are only one or two operations on the structures, I use a function for each layer:

    sub process_frumptions { my ( $frumption_set ) = @_; for my $frumption ( keys %$frumption_set ) { process_one_frumption $frumption; } } sub process_one_frumption { my ( $frumption ) = @_; for my $barfloon ( keys %{$frumption->{'barfloon_gallery'}} ) { process_one_barfloon $barfloon; } }

    Of course, you may have to profile and optimize, but most of the time I find the code runs fast enough, for some definition of "fast enough".

    --
    TTTATCGGTCGTTATATAGATGTTTGCA