Re: Re: Re: Re: Re: Re: Re: A (memory) poor man's hash

in reply to Re: Re: Re: Re: Re: Re: A (memory) poor man's hash
in thread A (memory) poor man's <strike>hash</strike> lookup table.

I'm guessing that by BOM you mean Byte-Order Marking. (This based on a google search for an unknown acronym.) Which probably refers to the character escaping suggestion that I gave.

Please define what you mean by, "won't work". Won't work as in doesn't do what you want? Possible, I don't know what you want. Won't work as in doesn't do what I said it does? That's another story. It does exactly what I said.

If you want to allow your key/value pairs to be able to hold arbitrary binary data in your datastructure, pre and post process them as I described and you will succeed. The preprocessing gets rid of the character that you are using as a separator. The postprocessing recovers the original string data. Adding any processing takes time, so there is going to be some performance hit. My guess is not much of one since Perl's RE engine is pretty fast and the REs in question are pretty simple.

So what you summarize by, Nah. Using BOMs won't work. actually works exactly like I said it did.

As for the rest, please be specific in your complaints. Vaguely waving hands doesn't give me much feedback.

Here is a specific example to illustrate what I mean.

[...]I didn't say that I consider cache optimisation unimportant. I do doubt that it is possible, in a meaningful way, for cross-platform development, or even practical for most purposes unless it is performed by compilers or interpreters tuned to the target platform.[...]
[...]With cache optimization, we need to specify our goals first. If your goal is to achieve a universal large win, or to achieve any kind of ideal optimization, then optimizing cache coherency is an impossible ideal. But that isn't my goal. My goal would be to have data structures which will tend to perform better. And that is quite possible.[...]
[...]. Instead, you introduce a subject vaguely related to the original subject matter, open with an obvious counter to a non-sequita not in in discussion, and then support that obvious arguement at length, with the implication that if you said "it is", then your opponent must have already said it isn't.[...]

I could have taken that particular thread back further, but that is far enough.

From my point of view, your phrase in a meaningful way is unclear in the extreme. I don't know what you mean by that. I know what I would mean by that, and it clearly isn't what you mean because I come to opposite conclusions. So I took pains to explain exactly how I would understand that phrase, and why my understanding leads me to a different conclusion than you came to.

My hope was that by making it clear exactly where and why we differ in our views that we could clarify the difference in our perspectives. But it seems that you have misinterpreted that as being a negative argument against you. :-(

I don't think that basic facts are really in dispute. Let me summarize them. Something like Judy arrays attempt to dynamically optimize themselves to account for the cost of cache misses. I think that we agree on the following facts about Judy arrays:

Making something like Judy arrays work takes a lot of work.
The specific tradeoffs made by Judy arrays will work far better on some CPUs than others.
Judy arrays can beat hashing on many different CPUs.
Judy arrays can be used for pretty much the same things that hashing is used.

I have more claims that I don't know whether you agree with.

Even where usage patterns aren't exactly what Judy arrays are optimized best for, they are likely to be a win.
Current Moore's law trends indicate that the ratio between how well Judy arrays and hashing perform will temd towards being more in favour of Judy arrays in future generations of chips.
I suspect that the ratio between Judy arrays and an ideally designed data structure for the chip at hand will get worse over time.

Now my perspective of these claims is that replacing the black box of hashing with the black box of something like Judy arrays can be a meaningful and practical cache optimization for most purposes for crossplatform development even though it is not specifically tuned to the target platform. Which is exactly what you claimed to doubt. OK, more exactly you stated, I do doubt that it is possible, in a meaningful way, for cross-platform development, or even practical for most purposes unless it is performed by compilers or interpreters tuned to the target platform.

As I see it there are a few possible causes for that disagreement:

Your perspective of what is "meaningful" is different than mine.
You hadn't considered one or more of those claims.
You think that one or more of those claims is wrong.

I still would like to understand that disagreement. I can guess. My guess is that we have very different aims when it comes to performance, so while I'm happy with a trivially achieved modest win, you are unhappy with anything less than the really major wins that you can see are possible, albeit with a lot of work.

But I'm not sure of that guess, and I really don't understand the value system which makes performance that big of a goal.

In Section Meditations