Re^2: Associative array

Replies are listed 'Best First'.
Re^3: Associative array by ELISHEVA (Prior) on Jul 13, 2009 at 08:09 UTC
Indeed. I put this somewhere in the same category as "lists in scalar context". It fits the orthography and good luck trying to get people to see differently. However, if one views a hash as an associative array one is likely to end up in all sorts of confusion, especially if one is first familiar with Lisp. Two of the most important sources of confusion are: (a) confusion of implementation with concept and (b) confusion over ordering. The name "associative array" confuses an implementation of a hash with the concept of a hash. Conceptually, a Perl hash is a data structure that will allow one to retrieve a value by name rather than position. There are many ways to implement this. For example, in Lisp there are two different ways to get hash-like effects: alists and plists. alists (association lists) are written like this: `((rose . red) (lily . white) (buttercup . yellow))`. The Perl equivalent would be `(['rose','red'],['lily','white'], ['buttercup', 'yellow'])`. This is clearly something quite different from what is meant in Perl by a hash. See Association Lists plists (property lists) are written like this: `(rose red lily white buttercup yellow lily white)`. The Perl equivalent would be written nearly identically:`qw(rose red lily white buttercup yellow lily white)`. But again, this is not what Perl considers as a hash. It is merely an alternating sequence of keys and values. See Property Lists In many other languages, Perl included, hashes (sometimes referred to as dictionaries) are implemented using data structures that are much more complicated than either a-lists or p-lists. Perl in fact has two entirely different C structures for hashes (HV) and arrays (AV). Although both have ARRAY, FILL, and MAX to describe their memory usage, the similarity stops there. The HV structure is carefully organized to maximize the speed with which values can be retrieved by property name and to conserve storage (in some cases keys are stored as pointers to a shared set of keys rather than having their own private string representation). See PerlGuts Illustrated. Now, I suppose you could argue that even Perl stores key-value pairs in its ARRAY member and that makes it an associative array. However, that is a misleading oversimplification. And it brings us to the next issue: ordering. Another problem with viewing hashes as associative arrays is that an array has an explicit ordering to its elements, whether those elements are simple scalars or key-value pairs. Hashes, by contrast, have an undefined unordering. Failing to grasp the distinction can cause serious problems, including hard to track intermittent bugs. One who views a hash as an associative array is likely to assume that ordering of hash members is consistent between runs of programs and may perhaps base their program on that assumption. However, this is a dangerous assumption. Perl hashes have an unpredictable order by intent. One of the techniques to safeguard software from algorithmic complexity attacks is to randomize the order of hash keys, thus making it harder to find a predictable spot to insert dangerous code. See keys and perlsec for further discussion. Best, beth	[reply] [d/l] [select]
Re^4: Associative array by tilly (Archbishop) on Jul 13, 2009 at 09:57 UTC
I would argue that you have completely reversed the proper comparison; calling the data structure a hash confuses implementation and concept, while calling it an associative array focuses on the concept. As Wikipedia says, an associative array is an abstract data type that maps a set of keys onto values. The analogy to an array is that arrays associate indexes with values, while associative arrays associate more general keys with values. So it is like an array, but with a more flexible choice of association. (Depending on the language and/or library, keys may even be arbitrary objects.) There are many ways to implement them. In fact the two Lisp examples you gave are valid (albeit inefficient) implementations! The term "associative array" is used in many languages. You can see how ubiquitous the term is across languages by googling for associative array. When I did that, the top links were to the Wikipedia page I gave, then articles on PHP, JavaScript, Oracle PL/SQL, PHP again, an entry in a data structures dictionary, an article on C++ compilers, a Perl article, etc. Clearly the term is very well established in the broader programming world. By contrast when we say hash we're referring to the implementation. Perl hashes are internally implemented using hash tables. (Common Lisp also has hash tables built in.) However there is no guarantee that Perl's hashes will always be so implemented. Perl has changed hashing algorithms in the past, and theoretically could switch to using a BTree at some point in the future. (Extremely unlikely though.)	[reply]
Re^5: Associative array by Anonymous Monk on Jul 13, 2009 at 10:04 UTC
It is easier to say hash than AA without spilling beer :)	[reply]
Re^5: Associative array by ELISHEVA (Prior) on Jul 13, 2009 at 11:59 UTC
When I first responded to the OP I suspect I was misreading the term "associative array" for "association list" (the particular way the OP was using hash and array reinforced that for me). But I persisted in my point because the more I think about it, I really do think that the term "associative array" is one of the not-so-smart-terms that has developed in our industry and is fundamentally misleading, despite a history that goes back at least as far as AWK. The term 'associative array' isn't implementation free to everyone, Wikipedia and even some recent text books aside. At the bottom of the Wikipedia discussion page for "associative array" there is a brief back and forth about potential confusion between the two terms "associative list" and "associative array" (the decision was to discourage implementation specific meanings). Consider also, for example, this definition from Mastering Algorithms with C, (Kyle Louden, O'Reilly, 1999) which defines an associative array as: Associative arrays consist of data arranged so that the nth element of one array corresponds to the nth element of another. p. 142 I agree with you that the term "hash" is also laden with implementation associations, though less so, in my opinion. Although the term hash implies a hash function (as opposed to a balanced tree which uses structure to facilitate retrieval), there are many different hash function implementations, so ultimately even a hash function is more of a concept than a specific implementation. More importantly, it is hard to make assumptions about the physical arrangement of a data structure when the "implementation" is a category of functions. I don't think the same thing can be said of the word "array". An array is a physical structure that supports both sequenced and random access. Because its indices are integers, it has implications for ordering that are directly contradictory to the intent of the abstract meaning of the term "associative array" used in the AWK documentation or the Wikipedia definition. Both those definitions attempt to generalize the term "array" by focusing on random access (via non-numerical keys) at the expense of sequenced access. There are, of course, better conceptual terms: "map", "dictionary", or even "association container". If we were talking Java, I would use the term Map (Java's abstract container for this purpose). However, in Perl we are stuck with "hash" and it is the Perl use of the word we need to explain. Best, beth	[reply]
Re^6: Associative array by tilly (Archbishop) on Jul 13, 2009 at 19:05 UTC
Re^4: Associative array by BrowserUk (Patriarch) on Jul 13, 2009 at 14:34 UTC
Hm. Sounds like you're trying to start YAPS (Yet Another Perl Shibboleth). You'll excuse me if I do not wish you well with that. The alists & plists stuff is a read [intentially sic] herring: because, (to the best of my very limited Lisp knowledge), neither can perform the fundamental operation of both associative arrays and hashtables: that of random access of values by key. To the best of my knowledge, access to both is strictly sequential from the head. The implementation details are irrelevant: because associative array is an abstract datatype defined in terms of its operations. It can be implemented many different ways. The implicit ordering is a misnomer; You are conflating the order in which the data is accessed when iterated, with the ordering of the data itself. Nobody except the rawest newbie expects AA/hashtables to be ordered. And when they do, its for the wrong reasons. As LW is reputed to have said: "iterating over the keys of a hash is like clubbing someone to death with a loaded Uzi". Saying that ordinary arrays have implicit ordering is equally a misnomer. If you iterate an array in ascending index order; shuffle the contents and again iterate by ascending index, the data (values) will be differently ordered. The pseudo-randomising of the hash seed for security reasons has as much to do with the iteration order of a hash, as the type of lock on a door has to do with the order people will come through it in the morning. The hash seed does not determine the order of iteration. That is (and remains) the sequentially ascending sequence of the bucket array (with diversions for non unitary buckets), just as for arrays. The hash seed simply determines which bucket the key/value pair is mapped to. Ie. It affects insertion not traversal. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW	[reply]
Re^5: Associative array by ELISHEVA (Prior) on Jul 14, 2009 at 05:46 UTC
Your initial question gave me an opportunity to think through some issues that have been annoying me for some time and for that I am grateful. But I am puzzled by the response above. In Lisp values are randomly accessible by key via helper functions (`assoc` or `find` for alists, `get` for plists). Are you suggesting that the lack of a syntax to hide the need for helper functions disqualifies them? The implementation details are irrelevant ... please see my response to tilly: Re^5: Associative array. Viewing array indices as keys (as opposed to mere access order) is the theoretical basis for claiming that "associative arrays" are a generalization of normal arrays, i.e. they have a more general universe of possible keys. If anyone is doing the confusing, it is whoever first came up with the term "associative array". My objection to the term comes from the fact that generalizing the key set also changes the mathematical properties of that key set. And that in turn changes the range of valid operations on those keys and how we work with them on a theoretical and even practical level. What operations are different? Well, for one, in a normal array we can meaningfully talk about the distance between elements because integer keys can be subtracted one from another. We can also divide indices (or more precisely the distance between two indices) by some constant. Those two properties play an important role in certain searching and sorting algorithms (binary search and sort for example) And then there is ordering. A finite range of integers has a natural well-ordering based on '<='. An arbitrary set of keys only does if we have defined (and applied) a function that maps those keys to the set of positive integers (or some subset). There are many, many ways to do this (locale based, asciibetical). In fact it is mathematically guarenteed that we can find such a function for any finite set. However, this is an additional step that must be done in order to make an associative array behave like a normal array, assuming you even want to do that. But often we don't. We use normal arrays and "associative arrays" for very different kinds of algorithms. Associative arrays are best used when our algorithm relies on the semantic significance of each element. Normal arrays are used when our algorithm relies on the order of elements. Sorting and reordering only changes what value is associated with what key. It doesn't change the fundamental fact that the values are associated with integral keys and that integers have certain useful mathematical properties. In fact, one of the motivations for using normal arrays in sorting is their ability to map elements to a well ordered set of keys. Even if Perl is always using bucket order, isn't the net result a different key order? The documentation for keys says (I have bolded the key words): The keys are returned in an apparently random order. The actual random order is subject to change in future versions of perl, but it is guaranteed to be the same order as either the values or each function produces (given that the hash has not been modified). Since Perl 5.8.1 the ordering is different even between different runs of Perl for security reasons (see "Algorithmic Complexity Attacks" in perlsec). What am I misunderstanding there? Best, beth	[reply] [d/l] [select]
Re^6: Associative array by BrowserUk (Patriarch) on Jul 14, 2009 at 08:49 UTC