http://qs1969.pair.com?node_id=169802


in reply to Nested data structures... nasty?

The only major thing that irritates me about Perl's way of doing complex data structures is that the specification is implicit: you never sit down and declare a "Hash of Hashes of Arrays", for instance: you put arrayrefs in a hash, then a ref to that hash in another hash. And you never sit down and code up a list of admissible hash keys... if you haven't documented what's expected to be in the hash, and where it's inserted, your code can be pretty impenetrable.

On the whole, I think this way of doing things is better than the alternative: well-defined, rigid data structures described at compile time. (Yes, folks, I like LISP too -- conses are your friends. :-) What irritates me is that these "implicit" structures imposes an extra documentation burden, and one that's not always easily satisfied in code. Rob Pike claims that it's better to put the complexity in the data than the code, and I agree, but that doesn't help you if the data is undecipherable.

Update: Another possible disadvantage of Perlish (or LISPy, HaskellicSchemeing, etfc) "implicit, on the fly" data structures is that some compiler optimizations are difficult or even impossible. NOTE: I haven't actually tested these assumptions empirically; for one thing, I'm at work, and don't have the time. I'm going on my understanding of modern computer architecture and compiler optimization, which may be sorely lacking. Thou hast been warned.

For instance, if I want a 3d vector, I could write the following C:

typedef struct { float x; float y; float z; } vec3d;

Now, the compiler knows that a vec3d takes up exactly 12 bytes (assuming 4-byte floats), and can make a bunch of optimizations based on that knowledge and some information about the processor it's compiling for. For instance, it can pad the struct out to 16 bytes if addressing is faster on 16-byte boundaries. It can take a declaration like:

vec3d vertices[20];

and allocate 240 (or 320) contiguous bytes for it.

Contrast that to the implicit equivalent:

my %vert = ( 'x' => undef, # placeholder 'y' => undef, 'z' => undef, );

Perl knows nothing about the size of this hash: it has three elements now, but there's nothing stopping you from adding a dozen more in the very next statement. And since the size of the hash isn't guaranteed, the best Perl can do if you put twenty vertices in an array is allocate twenty contiguous references: that's better than nothing, but unless you're improbably lucky careful in your memory management, the hashes those refs point to are going to be scattered all over memory, which means cache misses and subsequent stalls.

The Perlish alternative is to create data structures in scalars with pack, which is just fugly.

The "Perl is slower than C!" thesis shouldn't be any surprise to anyone, of course. I just want to point out that the flexibility of hashed (hashish? ;-) structures is sometimes a disadvantage. Again, I want to emphasize that I tend to like this way better, but I'd be a fool to pretend that it's perfect.

Update 2: Minor grammar corrections.

Update 3: I'm not picking on Common LISP in particular, just the "add stuff on the fly" way of building complex structures, which you can do in Common LISP (or any other LISP, for that matter), and in my experience is the most common way of building structures.

A good general rule for finding language features in Common LISP is "it's in there somewhere". :-)

--
The hell with paco, vote for Erudil!
:wq

Replies are listed 'Best First'.
Re: (FoxUni) Re: Nested data structures... nasty?
by Anonymous Monk on May 29, 2002 at 00:03 UTC
    Common Lisp has had those kind of optimizations for ages. You tell it what your variables hold, and how much effort it should spend compiling for speed, and it figures out the rest. The only catch is that it still does garbage collection.

    For a real example of Common Lisp used in a high-performance situation, read this description of how Orbitz works.

Re: (FoxUni) Re: Nested data structures... nasty?
by theorbtwo (Prior) on May 28, 2002 at 20:47 UTC

    I agree pretty much totaly. ("Never say never".)

    This is going to be one of the nice things about perl6 -- you can nicely mix both styles, and declare a hash with limited namespace, each entry of which has a specific type, and thus get somthing like a C struct with known size and thus more optimizable.

    You can also have limited namespace but not declare the types of the entries strictly. Or pretty well anything else you can think of.

    Because the parrot virtual machine resembles a real machine, we can pull on all the experince with compiler optimization to real (register) machines.


    We are using here a powerful strategy of synthesis: wishful thinking. -- The Wizard Book

Re^2: Nested data structures... nasty?
by aufflick (Deacon) on Nov 12, 2004 at 05:34 UTC
    One of the key places this problem is really a problem is in the internal data structure of objects. Of course an object can be a blessed just-about-anything, but most people use a hash, and store the object state in there.

    In Perl 5 you have 'fields' which can help clean up the problem. I also really like the way accessors are handled in the reformed-perl OO syntax module: (my writeup, authors writeup).