Re^3: Bidirectional lookup algorithm? (sundialsvc4 spam)

in reply to Re^2: Bidirectional lookup algorithm? (sundialsvc4 spam)
in thread Bidirectional lookup algorithm? (Updated: further info.)

Yes, that is my name. But, pardon me, you don’t seem to be using yours. Wonder why. Nevertheless, all of this is entirely off-topic. I believe that someone asked a question about bidirectional lookup algorithms, yes?

Returning to the topic, then, if you really want to attempt to use a single indexing scheme to locate two keys, the only way that I know of to do this is to generate a synthetic key, within the same domain of values, for each key that you wish to store. You then employ a hashref keyed by this synthetic-key. Each bucket in this hashref points to an arrayref of references. You insert a reference to the (single ...) record into this hashref/arrayref structure multiple times, one for each synthetic key. Searching now involves retrieving the arrayref for either key, and iterating through it to find the actual record(s) that you want. The synthetic-key and constrained domain for that key is merely intended to reduce the size of the hashref: you will still have a lot of accompanying data-structures ... probably more-or-less equivalent to having two separate hashrefs.

= = = = =

But the over-arching problem remains the same: memory must be presumed to be unlimited and resident. Either you have that or you don’t. This strategy continues to be “random.” It does not exhibit “locality of reference.” It will produce a very high working-set size that, under production load conditions (really, any sort of virtual-memory stress at all ...) will perform prohibitively poorly, even though it may have seemed incredibly-clever on the developer’s machine, which was never actually “paging.”

(I like to give developers machines that are memory-tight. It forces them to write better code, because now they feel it, too.)

I confess to a little bit of déjà vu here, because I once was called-upon to undo a similar program. Oh, the thing was a gloriously clever mess. There was “cleverness” running all through it, and memory-wise it was a big, fat pig. Pretty much had to re-write the whole thing. It just couldn’t run. It acted, as I’ve said, “punch-drunk” when page-faults started hitting it, and of course, page-fault behavior affects everything on the machine. (Other things, too ... “oh no, don’t call a subroutine! Subroutines are ‘inefficient!’ It’s ‘faster’ to repeat yourself in-line!”) A big fat pig mess. But it sure did look clever to the developer, who by then had conveniently slipped out the door. Code needs to be fast enough, under production conditions, and it must be maintainable. That’s all. Don’t be “too clever by half.” Please. Pretty please.

In Section Seekers of Perl Wisdom