in reply to Re^3: Associative array
in thread Associative array

I would argue that you have completely reversed the proper comparison; calling the data structure a hash confuses implementation and concept, while calling it an associative array focuses on the concept.

As Wikipedia says, an associative array is an abstract data type that maps a set of keys onto values. The analogy to an array is that arrays associate indexes with values, while associative arrays associate more general keys with values. So it is like an array, but with a more flexible choice of association. (Depending on the language and/or library, keys may even be arbitrary objects.) There are many ways to implement them. In fact the two Lisp examples you gave are valid (albeit inefficient) implementations!

The term "associative array" is used in many languages. You can see how ubiquitous the term is across languages by googling for associative array. When I did that, the top links were to the Wikipedia page I gave, then articles on PHP, JavaScript, Oracle PL/SQL, PHP again, an entry in a data structures dictionary, an article on C++ compilers, a Perl article, etc. Clearly the term is very well established in the broader programming world.

By contrast when we say hash we're referring to the implementation. Perl hashes are internally implemented using hash tables. (Common Lisp also has hash tables built in.) However there is no guarantee that Perl's hashes will always be so implemented. Perl has changed hashing algorithms in the past, and theoretically could switch to using a BTree at some point in the future. (Extremely unlikely though.)

Replies are listed 'Best First'.
Re^5: Associative array
by Anonymous Monk on Jul 13, 2009 at 10:04 UTC
    It is easier to say hash than AA without spilling beer :)
Re^5: Associative array
by ELISHEVA (Prior) on Jul 13, 2009 at 11:59 UTC

    When I first responded to the OP I suspect I was misreading the term "associative array" for "association list" (the particular way the OP was using hash and array reinforced that for me). But I persisted in my point because the more I think about it, I really do think that the term "associative array" is one of the not-so-smart-terms that has developed in our industry and is fundamentally misleading, despite a history that goes back at least as far as AWK.

    The term 'associative array' isn't implementation free to everyone, Wikipedia and even some recent text books aside. At the bottom of the Wikipedia discussion page for "associative array" there is a brief back and forth about potential confusion between the two terms "associative list" and "associative array" (the decision was to discourage implementation specific meanings). Consider also, for example, this definition from Mastering Algorithms with C, (Kyle Louden, O'Reilly, 1999) which defines an associative array as:

    Associative arrays consist of data arranged so that the nth element of one array corresponds to the nth element of another. p. 142

    I agree with you that the term "hash" is also laden with implementation associations, though less so, in my opinion. Although the term hash implies a hash function (as opposed to a balanced tree which uses structure to facilitate retrieval), there are many different hash function implementations, so ultimately even a hash function is more of a concept than a specific implementation. More importantly, it is hard to make assumptions about the physical arrangement of a data structure when the "implementation" is a category of functions.

    I don't think the same thing can be said of the word "array". An array is a physical structure that supports both sequenced and random access. Because its indices are integers, it has implications for ordering that are directly contradictory to the intent of the abstract meaning of the term "associative array" used in the AWK documentation or the Wikipedia definition. Both those definitions attempt to generalize the term "array" by focusing on random access (via non-numerical keys) at the expense of sequenced access.

    There are, of course, better conceptual terms: "map", "dictionary", or even "association container". If we were talking Java, I would use the term Map (Java's abstract container for this purpose). However, in Perl we are stuck with "hash" and it is the Perl use of the word we need to explain.

    Best, beth

      I'm not arguing that associative array is a particularly good term. Just that it is a very widely used and accepted term. I think you accept that point. And I accept yours that "associative array" can cause some people to associate confusing things about arrays.

      If we're looking for alternatives I don't like "map" because it has too many other meanings. "Dictionary" strikes me as better. "Associative container" is too wordy ("associative array" has the same problem), but is used for this exact purpose in the C++ STL library. If we're going to make up a term, "association" is not bad.

      However multiple language communities already call them "associative arrays" (and then usually call them something shorter in daily use), and we're not going to change that.

      And a random note. If I tie a Perl hash to a DB_File DB_BTREE file, in what sense do I now have a hash? The tie interface really exposes the fact that we've got an abstract datatype, and the implementation it is tied to there has absolutely nothing to do with hash functions. That's what I mean when I say that the common term specifies an implementation too closely.